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Description 

MULTICHANNEL AUDIO CODING 

5 

Technical Field 

The invention relates generally to audio signal processing. The invention is 
particularly useful in low bitrate and very low bitrate audio signal processing. More 

1 0 particularly, aspects of the invention relate to an encoder (or encoding process), a decoder 
(or decoding processes), and to an encode/decode system (or encoding/decoding process) 
for audio signals in which a plurality of audio channels is represented by a composite 
monophonic ("mono*') audio channel and auxiliary (''sidechain") information. 
Alternatively, the plurality of audio channels is represented by a plurality of audio 

1 5 channels and sidechain information. Aspects of the invention also relate to a 

multichannel to composite monophonic channel downmixer (or downmix process), to a 
monophonic diannel to multichannel upmixer (or upmixer process), and to a monophonic 
chaimel to multichaimel decorrelator (or decorrelation process). Other aspects of the 
invention relate to a multichannel-to-multichaimel downmixer (or downmix process), to a 

20 multichannel-to-multichannel iq)mixer (or upmix process), and to a decorrelator (or 
decorrelation process). 

Background Art 

In the AC-3 digital audio encoding and decoding system, channels may be 
selectively combined or '^coupled" at high frequencies when the system becomes starved 

25 for bits. Details of the AC-3 system are well known in the art - see, for example: ATSC 
Standard A52/A: Digital Audio Compression Standard (AC-3), Revision A, Advanced 
Television Systems Committee, 20 Aug. 2001. The A/52A document is available on the 
World Wide Web at http://www.atsc.org/standards.html . The A/52A docummt is hereby 
incorporated by reference in its entirety. 

30 The frequency above which the AC-3 system combines channels on demand is 

referred to as the "coupling" frequency. Above the coupling frequency, the coupled 
channels are combined into a "'coupling" or composite channel. The encoder generates 
''coupling coordinates" (amplitude scale factors) for each subband above the coupling 
frequency in each channel. The coupling coordinates indicate the ratio of thiforiginal 
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energy of each coupled channel subband to the energy of the corresponding subband in 
the composite channel. Below the coupling frequency, channels are encoded discretely. 
Tlie phase polarity of a coupled channel's subband may be reversed before the chaimel is 
combined with one or more other coupled channels in order to reduce out-of-phase signal 
5 component cancellation. The composite channel along with sidechain information that 
includes, on a per-subband basis, tlie coupling coordmates and whether the channel's 
phase is inverted, are sent to the decoder. In practice, the coupling frequencies employed 
in commercial embodiments of the AC-3 system have ranged ferom about 10 kHz to about 
3500 Hz. U.S. Patents 5,583,962; 5,633,981, 5,727,119, 5,909,664, and 6,021,386 

10 include teachings that relate to the combining of multiple audio channels into a composite 
channel and auxiliary or sidechain information and the recovery th^efrom of an 
approximation to the original multiple channels. Each of said patents is hereby 
incorporated by reference in its entirety. 

Disclosure of the Invention 

1 5 Aspects of the present invention may be viewed as improvements upon the 

"coupling" techniques of the AC-'3 encoding and decoding system and also upon other 
techniques in which multiple channels of audio are combined either to a monophonic 
composite signal or to multiple channels of audio along with related auxiliary information 
and from which multiple channels of audio are reconstructed. Aspects of the present 

20 invention also may be viewed as improvements upon techniques for downmixing multiple 
audio dhaimels to a monophonic audio signal or*lo multiple audio diannels and for 
decorrelating multiple audio channels derived from a monophonic audio channel or from 

multiple audio channels. 

. Aspects of the. invention may be employed m an N: 1 :N spatial audio coding 

25 technique (where "N" is the number of audio channels) or an M: 1 :N spatial audio codmg 
technique (where "M" is the number of encoded audio channels and ^'N" is the number of 
decoded audio channels) that improve on channel coupling, by providing, among other 
things, improved phase compensation, decorrelation mechanisms, and signal-dependent 
variable time-constants. Aspects of the present invention may also be employed in N:x:N 

30 and M:x:N spatial audio coding techniques wherein "x" may be 1 or greater than 1. 
Goals include the reduction of coupling cancellation artifacts in the encode process by 
adjusting relative interchannel phase before downmixing, and improving^he spatial 
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dimensionality of the reproduced signal by restoring tlie phase angles and degrees of 
decorrelation in the decoder. Aspects of the invention when embodied in practical 
emboduTients should allow for continuous rather than on-demand channel coupling and 
lower coupling frequencies than, for example in the AC-3 system, thereby reducing the 
5 required data rate. 

Description of the Drawings 
FIG. 1 is an idealized block diagram showing the principal functions or devices of 
an N:l encoding arrangement embodying aspects of the present invention. 

FIG. 2 is an idealized block diagram showing the principal functions or devices of 
10 a 1 :N decoding arrangement embodying aspects of the present invention. 

FIG. 3 shows an example of a sunplified conceptual organization of bins and 
subbands along a (vertical) frequency axis and blocks and a frame along a (horizontal) 
time axis. The figure is not to scale. 

FIG. 4 is in the nature of a hybrid flowchart and functional block diagram 
1 5 showing encoding steps or devices performing fimctions of an encodmg arrangement 
embodying aspects of the preset invention. 

FIG. 5 is in the nature of a hybrid flowchart and functional block diagram 
showing decoding steps or devices performing functions of a decoding arrangement 
embodying aspects of the present invention. 
20 FIG. 6 is an idealized block diagram showing the principal functions or devices of 

a first N:x encoding arrangement embodying aspectg.of the present invention. 

FIG. 7 is an idealized blodc diagrani showing the principal fimctions or devices of 
an x:M decoding arrangement embodying aspects of the present invention. 

FIG. 8 is an idealized block diagram showing the principal fimctions or devices of 
25 a first alternative x:M decoding arrangement embodying aspects of the present invention. 

FIG. 9 is an idealized block diagram showing the principal functions or devices of 
a second alternative x:M decoding arrangement embodying aspects of the present 
invention. 

' Best Mode for (parrying Out tlte Invention 
30 Basic N:l Encoder 

Referring to FIG. 1, an N:l encoder function or device embodying aspects of the 
pre^t invention is shown. The figure is an example of a function or structi^e that 
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p^onns as a basic wcoder embodying aspects of the invention. Other functional or 
structural arrangements that practice aspects of the invention may be employed, including 
alternative and/or equivalent functional or stmctural arrangements described below. 
Two or more audio input channels are applied to the encoder. Although, in 

5 principle, aspects of the invention may be practiced by analog, digital or hybrid 

analog/digital embodiments, examples disclosed herein are digital embodiments. Thxis, 
flie input signals may be time samples that may have been derived from analog audio 
signals. The time samples may be encoded as linear pulse-code modulation (PCM) 
signals. Each linear PCM audio input channel is processed by a filterbank function or 

10 device having both an in-phase and a quadrature output, such as a S12-point windowed 
forward discrete Fourier transfonn (DFT) (as implemented by a Fast Fourier Transform 
(FFT)). The filterbank may be considered to be a time-domain to frequency-domain 
transform. 

FIG. 1 shows a first PCM channel input (channel "1") appUed to a filterbank 

15 function or device, 'Tilteibank" 2, and a second PCM channel input (channel "n") 

applied, respectively, to another filterbank function or device, "Filterbank" 4. There may 
be "n" input channels, where "n" is a whole positive integer equal to two or more. Thus, 
there also are "n" Filtwbanks, each receiving a unique one of flie "n" input channels. For 
sunplicity in presentation, FIG. 1 shows only two input channels, "1" and 

20 When a FUterbank is implemented by an FFT, input time-domain signals are 

segmented into consecutive blocks and are usimlly processed in overlappmg blocks. The 
FFT's discrete frequency outputs (transfonn coefficients) are referred to as bins, each 
havmg a complex value with real and imaginary parts corresponding, respectively, to in- 
phase and quadrature components. Contiguous transform bins may be grouped into 

25 subbands approximating critical bandwidths of the human ear, and most sidechain 
inforaiation produced by the encoder, as will be described, may be calculated and 
transmitted on a per-subband basis in order to niinimi2» processing resources and to 
reduce tiie bitrate. Multiple successive time-domain blocks may be grouped into frames, 
witii individual block values averaged or otherwise combined or accumulated across each 

30 frame, to minimize flie sidecham data rate. In examples described herein, each filterbank 
is implemented by an FFT, contiguous transfonn bins are grouped into subbands, blocks 
are grouped into fi^es and sidechain data is sent on a once p^-frame^asis. 
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Altematively, sidechain data may be sent on a more than once per frame basis (e.g., once 
per block). See, for example, FIG. 3 and its description, hereinafter. As is well known, 
there is a tradeoff between the frequency at which sidechain information is sent and the 
required bitrate.. 

* ■ ■ ■ 

V 

5 A suitable practical implementation of aspects of the present invention may 

employ fixed length frames of about 32 milliseconds when a 48 kHz sampling rate is 
employed, each frame having six blocks at intervals of about 5.3 milliseconds each 
(employing, for example, blocks having a duration of about 10.6 milliseconds with a 50% 
overlap). However, neither such timings nor the employment of fixed length fr ames nor 

10 their division into a fixed mimber of blocks is critical to practicing aspects of the 

invention provided that information described herein as being sent on a per-frame basis is 
sent no less frequently than about every 40 milliseconds. Frames may be of arbitrary size 
and their size may vary dynanoically. Variable block lengths may be employed as in the 
AC-3 system cited above. It is with that understanding that reference is made herein to 

15 '^frames" and 'Wocks." 

In practice, if the composite mono or multichannel signal(s), or the composite 
mono or multichannel signal(s) and discrete low-frequency channels, are encoded, as for 
example by a perceptual coder, as described below, it is convenimt to employ the same ' 
frame and block configuration as employed in the perceptual coder. Moreover, if the 

20 coder employs variable block lengths such that there is, from time to time, a switching 
from one block length to another, it would be desirable if one or more of the sidechain 
information as described herein is updated when such a block switch occurs. In order to 
minimize the increase in data overiiead upon the updating of sidechain information upon 
the occurrence of such a switch, the frequency resolution of the updated sidediain 

25 information may be reduced. 

FIG. 3 shows an example of a simplified conceptual organization of bins and 
subbands along a (vertical) frequency axis and blocks and a fi:ame along a (horizontal) 
time axis. When bins are divided into subbands that approximate critical bands, the 
lowest frequency subbands have the fewest bins (e.g., one) and the number of bins per 

3 0 subband increase with increasing frequency. 

Returning to FIG. 1, a frequency-domain version of each of the n time-domain 
input channels, produced by the each channel's respective Filteibank (Filteit^|nks 2 and 4 
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in this example) are summed together ("downmixed") to a moDophonic ("mono") 
composite audio signal by an additive combining function or device "Additive Combiner" 
6. 

The downmixing may be applied to tlie entire frequency bandwidth of the input 
5 audio signals or, optionally, it may be limited to frequencies above a given "coupling" 
frequency, inasmuch as artifacts of the downmixing process may become more audible at 
middle to low frequencies. In such cases, the channels may be conveyed discretely below 

m 

i 

the coupling frequency. This strategy may be desirable even if processing artifacts are 
not an issue, in that mid/low frequency subbands constructed by grouping transform bins 

10 into critical-band-like subbands (size roughly proportional to frequency) tend to have a 
small number of transform bins at low frequencies (one bin at very low frequencies) and 
may be directly coded with as few or fewer bits than is required to send a downmixed 
mono audio signal with sidechain information. A coupling or transition frequency as low 
as 4 kHz, 2300 Hz, 1000 Hz, or even the bottom of the frequency band of the audio 

15 signals applied to the encoder, may be acceptable for some applications^ particularly those 
in which a very low bitrate is important. Other frequencies may provide a usefiil balance 
between bit savings and listener acceptance. The choice of a particular coupling 
frequency is not critical to the invention. The coupling frequency may be variable and, if 
variable, it may dqpend, for example, directly or indirectly on input signal characteristics. 

20 Before downmixing, it is an aspect of the present invention to improve the 

channels' phase angle alignments vis-i-vis eachother, in order to reduce the cancellation 
of out-of-phase signal components when the channels are combined and to provide an 
improved mono composite channel. This may be accomplished by contiollably shifting 
over time the "absolute angle" of some or all of the transform bins in ones of the 

25 channels. For example, all of the transform bins representing audio above a coupling 
frequency, thus defining a frequency band of interest, may be controUably shifted over 
time, as necessary, in every channel or, when one diaimel is used as a reference, in all but 

the reference channel. 

The "absolute angle" of a bin may be tekm as the angle of the magnitude>and- 
3 0 angle representation of each complex valued transform bin produced by a filterbank. 

Controllable shifting of the absolute angles of bins in a channel is performed by an angle 
rotation function or device Cllotate Angle"). Rotate Angle 8 processes ou^ut of 
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Filterbank 2 prior to its application to the downmix summation provided by Additive 
Combiner 6, while Rotate Angle 10 processes the output of Filterbank 4 prior to its 
application to the Additive Combiner 6. It will be appreciated that, under some signal 
conditions, no angle rotation may be required for a particular tiansform bin over a time 
5 period (tiie time period of a frame, in examples described herein). Below the coupling 
frequency, the channel infonuation may be encoded discretely (not shown in FIG. 1). 

In principle, an improvement in the channels' phase angle alignments with respect 
to each other may be accomplished by shifting the phase of every transform bin or 
subband by the negative of its absolute phase angle, in each block throughout the 
10 frequency band of interest. Although this substantially avoids cancellation of out-of- 

phase signal components, it tends to cause artifacts that may be audible, particularly if the 
resulting mono composite signal is listened to in isolation. Tlius, it is desirable to employ 
the principle of "least treatment" by shifting the absolute angles of bins in a channel only 
as much as necessary to rninimize out-of-phase cancellation in the downmix process and 
1 5 minimize spatial image collapse of the multichannel signals reconstituted by the decoder. 
Techniques for detennining sudi angle shifts are described below. Such techniques 
include time and frequency smoothing and the marmer in which the signal processing 
responds to the presence of a transient 

Energy normalization may also be performed on a per-bui basis in the encoder to 
20 reduce further any remaining out-of-phase cancellation of isolated bins, as described 
fiirther below. Also as described fiirther below, ei]iCTgy normalization may also be 
perfomed on a per-subband basis (in the decoder) to assure that the energy of the mono 
coniposite signal equals the sums of the energies of the contributing channels. 

Each input channel has an audio analyzer ftmction or device ("Audio Analyzer* *) 
25 associated with it for generating the sidechain information for that chaimel and for 
controlling the amount or degree of angle rotation applied to the chaimel before it is 
appUed to the downmix summation 6. The Filterbank outputs of channels 1 and n are . 
appUed to Audio Analyzer 12 and to Audio Analyzer 14, respectively. Audio Analyze: 
12 generates the sidechain information for channel 1 and the amount of phase angle 
30 rotation for channel 1. Audio Analyzer 14 generates the sidechain information for 

channel n and the amount of angle rotation for channel n. It will be understood that such 
references herein to "angle" refer to phase angle. ^ 
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The sidechain infonnatioii for each channel generated by an audio analyzer for 
each channel noiay include: 

an Amplitude Scale Factor ("Amplitude SF"), 

an Angle Control Parameter, 

a Decorrelation Scale Factor CT)ecorrelation SF"), 

a Transient Flag, and 

optionally, an Interpolation Flag. 
Such sidechain information may be characterized as "spatial parameters," indicative of 
spatial properties of the channels and/or indicative of signal characteristics that may be 
relevant to spatial processing, such as transients. In each case, the sidechain information 
applies to a single subband (except for the Transient Flag and the Intrapolation Flag, each 
of which apply to all subbands within a channel) and may be updated once per frame, as 
in the examples described below, or upon the occurrence of a block switch in a related 
coder. Further details ofthe various spatial parameters are set forth below. The angle 
rotation for a particular channel in the encoder may be taken as the polarity-reversed 
Angle Control Parameter that forms part ofthe sidechain information. 

If a reference diannel is employed, that channel may not require an Audio 
Analyzer or, alternatively, may require an Audio Analyzer that generates only Amplitude 
Scale Factor sidechain informatioiL It is not necessary to send an Amplitude Scale Factor 
if that scale factor can be deduced with sufficient accuracy by a decoder from the 
Amplitude Scale Factors ofthe other, non-refetmce, channels. It is possible to deduce in 
the decoder the approximate value of the reference channel's Amplitude Scale Factor if 
the energy normalization in the mcoder assures that tiie scale factors across channels 
within any subband substantially sum square to 1, as described below. The deduced 
approximate reference channel Amplitude Scale Factor value may have errors as a result 
ofthe relatively coarse quantization of amplitude scale factors resulting in image shifts in 
the reproduced multi-channel audio. However, in a low data rate environment, such 
artifacts may be more acceptable than using the bits to send the reference channel* s 
Amplitude Scale Factor. Nevertheless, in some cases it may be desirable to employ an 
audio analyzer for the reference channel that generates, at least. Amplitude Scale Factor 
sidechain information. 
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FIG. 1 shows in a dashed line an optional input to each audio analyzer from the 
PCM time domain input to the audio analyzer in the channel. This input may he used by 
the Audio Analyzer to detect a transient over a time period (the period of a block or 
fiame, in the examples described herein) and to generate a transient indicator (e.g., a one- 
5 bit 'Transient Flag") in response to a transient Altematively, as described below in the 
comments to Step 408 of FIG. 4, a transient may be detected in the frequency domain, in 
which case the Audio Analyzer need not receive a time-domain input. 

Tlie mono composite audio signal and the sidechain information for all the 
channels (or all the channels except the reference channel) may be stored, transmitted, or 

1 0 stored and transmitted to a decoding process or device ('Decoder"). Preliminary to the 
storage, transmission, or storage and transmission, the various audio signals and various 
sidechain information may be multiplexed and packed into one or more bitstreams 
suitable for the storage, transmission or storage and transmission medium or media. The 
mono composite audio may be appUed to a data-rate reducing encoding process or device 

15 such as, for example, a perceptual encoder or to a perceptual encoder and an entropy 
coder (e.g., arithmetic or Huf&nan coder) (sometimes referred to as a 'lossless" coder) 
prior to storage, transmission, or storage and transmission. Also, as mentioned above, the 
mono composite audio and related sidechain information may be derived from multiple 
input channels only for audio frequencies above a certain frequency (a "coupling" 

20 fi^uency). In that case, the audio frequencies below tbe coupling frequency in each of 
the multiple input channels may be stored, transmitted or stored and transmitted as 
discrete channels or may be combined or processed in some manner other than as 
desoibed h^eiiL Such discrete or othomse-combined channels may also be applied to a 
data reducing encoding process or device such as, for example, a perceptual encoder or a 

25 perceptual encode and an entropy encoder. The mono composite audio and the discrete 
multichannel audio may all be appUed to an integrated perceptual encoding or perceptual 
and entropy encoding process or device. 

The particular manner in which sidechain information is carried in the encoder 
bitstream is not critical to ttie inventioa If desired, flie sidechain information may be 

30 carried in such as way that the bitstream is compatible with legacy decoders (Le., the 
bitstream is backwards-compatible). Many suitable techniques for doing so are known. 
For example, many encoders g^erate a bitstream having unused or null bits^at are 



t 

9 

wo 2005/086139 PCT/US2005/006359 

* 

-10- 

■ 

ignored by the decoder. An example of such an arrangement is set forth in United States 
Patent 6,807,528 Bl of Truman et al, entitled ''Adding Data to a Compressed Data 
Frame," October 19, 2004, which patent is hereby incorporated by reference in its 
entirety. Such bits may be replaced with the sidechain information. Another example is 

5 tliat the sidechain information may be steganograpliically encoded in the encoder's 
bitstream. Alternatively, the sidechain information may be stored or transmitted 
separately from the backwards-compatible bitstream by any technique that permits the 
transmission or storage of such information along with a mono/stereo bitstream 
compatible with legacy decoders. 

10 Basic l:Nand 1:M Decoder 

Referring to FIG. 2, a decoder function or device ('Decodef ) embodying aspects 
of the present invention is shown. The jSgure is an example of a function or structure that 
performs as a basic decoder embodying aspects of the invention. Other functional or 
structural arrangements that practice aspects of the invention may be employed, including 

15 alternative and/or equivalent functional or structural arrangements described below. 

The Decode receives the mono composite audio signal and the sidechain 
information for all the channels or all the channels except the reference chaimel. If 
necessary, the composite audio signal and related sidechain information is demultiplexed, 
unpacked and/or decoded. Decoding may employ a table lookup. The goal is to derive 

20 from the mono composite audio channels a pluraUty of individual audio channels 

approximating respective ones of the audio channels applied to the Encoder of FIG. 1, 
subject to bitrate-reducing techniques of the present invention that are described herein. 

Of course, one may dioose not to recover all of the channels applied to the 
mcoder or to use only the monophonic composite signal. Alternatively, channels in 

25 addition to the ones applied to the Encoder may be derived from the output of a Decoder 
according to aspects of the present invention by employing aspects of the inventions 
described in International AppUcation FCTAJS 02/03619, filed February 7, 2002, 
published August IS, 2002, designating the United States, and its resulting U.S. national 
application S.N. 10/467,213, filed August 5, 2003, and in International Application 

30 PCT/US03/24570, filed August 6, 2003, published Mardi 4, 2001 as WO 2004/019656, 
designating the United States, and its resultmg U.S. national application S.N. 10/522,515, 
filed January 27, 2005. Said applications are hereby incorporated by re^ence in their 
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entirety. Channels recovered by a Decoder practicing aspects of the present invention are 
particularly useful in connection with the channel multiplication techniques of the cited 
and incorporated applications in that the recovered channels not only have useful 
interchannel amplitude relationships but also have useful interchannel phase relationships. 
5 Another alternative for channel multiplication is to employ a matrix decoder to derive 
additional channels. The interchannel amplitude- and phase-preservation aspects of tlie 
present invention make the output charmels of a decoder embodying aspects of the 
present invention particularly suitable for application to an amplitude- and phase-sensitive 
matrix decoder. Many such matrix decoders employ wideband control circuits that 

1 0 operate properly only when the signals applied to them are stereo throughout the signals' 
bandwidth. Thus, if the aspects of the present invention are ^bodied in an N: 1 :N system 
in which N is 2, the two channels recovered by tlie decoder may be applied to a 2:M 
active matrix decoder. Such channels may have been discrete channels below a coupling 
frequency, as mentioned above. Many suitable active matrix decoders are well known in 

15 the art, including, for example, matrix decoders known as 'Tro Logic" and *Tro Logic IF* 
decoders CTro Logic" is a trademark of Dolby Laboratories Licensing Corporation). 
Aspects of Pro Logic decoders are disclosed in U.S. Patents 4,799,260 and 4,941,177, 

each of which is incorporated by reference herein in its entirety. Aspects of Pro Logic 11 

I- 

decoders are disclosed in pending U.S. Patent Application S-N. 09/532,71 1 of Fosgate, 
20 entitled '"Method for Deriving at Least Three Audio Signals from Two Input Audio 

Signals,?' filed March 22, 2000 and published as WO 01/41504 on June 7, 2001, and in 
pending U.S. Patent Application S.N. 10/362,786 of Fosgate et al, entitled '"Method for 
Apparatus for Audio Matrix Decoding," filed February 25, 2003 and published as US 
2004/0125960 Al on July 1, 2004. Each of said applications is incorporated by reference 
25 herein in its entirety. Some aspects of the operation of Dolby Pro Logic and Pro Logic II 
decoders are explained, for example, in papers available on the Dolby Laboratories' 
website (www.dolby.com): 'T)olby Surround Pro Logic Decoder Principles of 
Operation," by Roger Dressier, and "Mixing with Dolby Pro Logic 11 Technology, by Jim 
Hilson. Other suitable active matrix decoders may include those described in one or more 
30 of the following U.S. Patents and published IntOTiational Applications (each designating 
the United States), eadi of whidx is hereby incorporated by reference in its entirety: 



wo 2005/086139 PCTAJS2005/006359 

-12- 

5,046,098; 5,274,740; 5,400,433; 5,625,696; 5,644,640; 5,504,819; 5,428,687; 5,172,415; 
and WO 02/19768. 

Referring again to FIG. 2, the received mono composite audio channel is applied 
to a plurality of signal paths from wliich a respective one of each of the recovered 
5 multiple audio channels is derived. Each channel-deriving path includes, in either order, 
an amplitude adjusting function or device ("Adjust Amplitude") and an angle rotation 
function or device ("Rotate Angle"). 

■ 

The Adjust Amplitudes apply gains or losses to tlie mono composite signal so that, 
under certain signal conditions, the relative output magnitudes (or energies) of the output 

10 channels derived from it are sunilar to those of the channels at the input of tlie encoder. 
Alternatively, under certain signal conditions when '^randomized" angle variations are 
unposed, as next described, a controllable amount of **randomized" amplitude variations 
may also be imposed on the amplitude of a recovered channel in order to improve its 
decorrelation wifli respect to other ones of the recovered channels. 

1 5 The Rotate Angles apply phase rotations so that, imder certain signal conditions, 

the relative phase angles of the ou^ut channels derived from the mono composite signal 
are similar to those of the channels at the input of the encoder. Preferably, under certain 
signal conditions, a controllable amount of ''randomized" angle variations is also unposed 
on the angle of a recovered channel in order to improve its decorrelation with respect to 

20 other ones of the recovered channels. 

As discussed further below, "randomized" angle amplitude variations may mclude 

not only pseudo-random and truly random variations, but also detemiinistically-generated 

variations that have the effect of reducmg cross-correlation between channels. This is 

discussed further below in the Comments to Step 505 of FIG. 5 A. 
25 Conceptually, the Adjust Amplitude and Rotate Angle for a particular channel 

scale the mono composite audio DFT coefficimts to yield reconstructed transform bin 

values for the channel. 

The Adjust Amplitude for each channel may be controlled at least by tiie 
recovered sidechaui Amplitude Scale Factor for the particular channel or, in the case of 
30 the reference channel, either from the recovered sidechain AmpUtude Scale Factor for the 
reference channel or from an AmpUtude Scale Factor deduced from the recovered 
sidechain Amplitude Scale Factors of the other, non-refermce, channel^ Alternatively, 
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to enhance decorrelation of the recovered channels, the Adjust Amplitude may also be 
controlled by a Randomized Amplitude Scale Factor Parameter derived from the 
recovered sidechain Decorrelation Scale Factor for a particular channel and the recovered 
sidechain Transient Flag for the particular channel. 
5 The Rotate Angle for each channel may be controlled at least by the recovered 

sidechain Angle Control Parameter (in wliich case, the Rotate Angle in the decoder may 
substantially undo tlie angle rotation provided by the Rotate Angle in the encoder). To 
enhance decorrelation of the recovered channels, a Rotate Angle may also be controlled 
by a Randomized Angle Control Parameter derived from the recovered sidechain 
1 0 Decorrelation Scale Factor for a particular channel and the recovered sidechain Transient 
Flag for the particular channel. The Randomized Angle Control Parameter for a channel, 
and, if employed, the Randomized Amplitude Scale Factor for a channel, may be derived 
from the recovered Decorrelation Scale Factor for the channel and the recovered 
Transient Flag for the channel by a controllable decorrelator function or device 

1 5 ("Controllable Decorrelator"'). 

Referring to the example of FIG. 2, the recovered mono composite audio is 
applied to a first channel audio recovery path 22, which derives the channel 1 audio, and 
to a second channel audio recovery path 24, which derives the charmel n audio. Audio 

*. 

t 

path 22 includes an Adjust Amplitude 26, a Rotate Angle 28, and, if a PCM output is 
20 desired, an inverse filterbank function or device (^Inverse Filterbank") 30. Similarly, 
audio path 24 includes an Adjust Amplitude 32, a Rotate Angle 34, and, if a PCM output 
is desired, an inverse filterbank function or device ('Inverse Filterbank**) 36. As with the 
case of FIG. 1, only two chaimels are shown for simplicity in presentation, it being 
understood that there may be more than two chaimels. 
25 The recovered sidechain information for the first chaimel, charmel 1 , may include 

an Amplitude Scale Factor, an Angle Control Parameter, a Decorrelation Scale Factor, a 
Transient Flag, and, optionally, an Interpolation Flag, as stated above in connection with 
the description of a basic Encoder. The Amplitude Scale Factor is applied to Adjust 
Amplitude 26. If the optional Interpolation Flag is employed, an optional frequency 
30 interpolator or intapolator fimction CTnterpolator"') 27 may be employed in order to 
interpolate the Angle Control Parameter aoross frequ^cy (e.g.^ across the bins in each 
subband of a channel). Such interpolation may be, for example, a linear int^olation of 



wo 2005/086139 PCT/US2005/0063S9 

-14- 

the bin angles between the centers of eadi subband. The state of the one-bit Interpolation 
Flag selects whether or not interpolation across frequency is employed, as is explained 
fiirther below. The Transient Flag and Decorrelation Scale Factor are applied to a 
Controllable Decorrelator 38 tliat generates a Randomized Angle Control Parameter in 
5 response thereto. The state of the one-bit Transient Flag selects one of two multiple 
modes of randomized angle decorrelation, as is explained furtlier below. The Angle 
Control Parameter, which may be interpolated across frequency if the Interpolation Flag 
and the Interpolator are employed, and the Randomized Angle Control Parameter are 
summed together by an additive combiner or combining ftinction 40 in order to provide a 

10 control signal for Rotate Angle 28. Alternatively, the Controllable Decorrelator 38 may 
also generate a Randomized Amplitude Scale Factor in response to the Transient Flag and 
Decorrelation Scale Factor, in addition to generating a Randomized Angle Control 
Parameter. The AmpUtude Scale Factor may be sunmied together with such a 
Randomized Amplitude Scale Factor by an additive combiner or combining function (not 

1 5 shown) in order to provide the control signal for the Adjust Amplitude 26. 

Similarly, recovered sidechain information for the second channel, channel n, may 
also include an Amplitude Scale Factor, an Angle Control Parameter, a Decorrelation 
Scale Factor, a Transient Flag, and, optionally, an Inteqjolate Flag, as described above in 
connection with the description of a basic encoder. The Amplitude Scale Factor is 

20 applied to Adjust Amplitude 32. A frequency interpolator or interpolator function 

(^Interpolator") 33 may be employed in orderW interpolate the Angle Control Parameter 
across frequency. As with channel 1, the state of the one-bit Int^polation Flag selects 
whetho- or not interpolation across frequmcy is employed. The Transient Flag and 
Decorrelation Scale Factor are applied to a Controllable Decorrelator 42 that generates a 

25 Randomized Angle Control Parameter in response thereto. As with diannel 1 ; the state of 
the one-bit Transient Flag selects one of two multiple modes of randomized angle 
decorrelation, as is explained further below. The Angle Control Parameter and the 
Randomized Angle Control Parameter are summed together by an additive combiner or 
combining function 44 in order to provide a control signal for Rotate Angle 34. 

30 Alternatively, as described above in connection with channel 1, the Controllable 

Decorrelator 42 may also generate a Randomized An^>Utude Scale Factor in response to 
the Transient Flag and Decorrelation Scale Factor, in addition to generaikig a 
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Randomized Angle Control Parameter. The Amplitude Scale Factor and Randomized 
Amplitude Scale Factor may be summed togeliier by an additive combiner or combining 
function (not shown) in order to provide the control signal for the Adjust Amplitude 32. 

Although a process or topology as just described is useful for understanding, 
essentially the same results may be obtained with alternative processes or topologies that 
acliieve tiie same or similar results. For example, the order of Adjust Amplitude 26 (32) 
and Rotate Angle 28 (34) may be reversed and/or there may be more than one Rotate 
Angle - one that responds to the Angle Control Parameter and another that responds to 
the Randomized Angle Control Parameter. The Rotate Angle may also be considered to 
be three rather than one or two functions or devices, as in the example of FIG. 5 described 
below. If a Randomized AmpUtude Scale Factor is employed, there may be more than 
one Adjust AmpUtude - one that responds to tlie Amplitude Scale Factor and one that 
responds to the Randomized Amplitude Scale Factor. Because of the human ear's greater 
sensitivity to amplitude relative to phase, if a Randomized Amplitude Scale Factor is 
employed, it may be desirable to scale its effect relative to tixe effect of tiie Randomized 
Angle Control Parameter so that its effect on amplitude is less tiian the effect that the 
Randomized Aiigle Control Parameter has on phase angle. As another alternative process 
or topology, tiie Decorrelation Scale Factor may be used to control tiie ratio of 
randomized phase angle versus basic phase angle (raflier tiian adding a parameter 
representing a randomized phase angle to a parameter representing flie basic phase angle), 
and if also employed, tiie ratio of randomized amplitude shift versus basic ampUtude shift 
(raflier tiian adding a scale factor representing a randomized amplitude to a scale factor 
representing tiie basic amplifaide) (i.e., a variable crossfade in each case). 

If a reference channel is anployed, as discussed above in connection with tiie 
basic encoder, tiie Rotate Angle, Controllable Decorrelator and Additive Combiner for 
fliat channel may be omitted inasmuch as flie sidechain information for tiie reference 
channel may include only ttie AmpUtude Scale Factor (or, alternatively, if tiie sidechain 
information does not contain an AmpUtude Scale Factor for tiie reference channel, it may 
be deduced ftom Anqplifaide Scale Factors of tiie oflier channels when tiie energy 
nonnaUzation in tiie encoder assures tiiat flie scale fectors across channels witiim a 
subband sum square to 1). An Amplitude Adjust is provided for tiie reference channel 
and it is controlled by a received or derived Amplitude Scale Factor for tiie jpferaice 
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channel. Whether the reference channel's Amplitude Scale Factor is derived from the 
sidechain or is deduced in tlie decoder, the recovered reference channel is an amplitude- 
scaled version of the mono composite channel. It does not require angle rotation because 
it is tlie reference for the other channels' rotations. 
S Although adjusting the relative amplitude of recovered channels may provide a 

modest degree of decorrelation, if used alone amplitude adjustment is Ukely to result in a 
reproduced soundfield substantially lacking in spatialization or imaging for many signal 
conditions (e,g,, a "collapsed" soundfield). Amplitude adjustment may aifect interaural 
level differences at the ear, which is only one of the psychoacoustic directional cues 

1 0 employed by the ear. Thus, according to aspects of the invention, certain angle-adjusting 
techniques may be employed, depending on signal conditions, to provide additional 
decorrelation. Reference may be made to Table 1 that provides abbreviated comments 
useful in understanding the multiple angle-adjusting decorrelation techniques or modes of 
operation that may be employed in accordance with aspects of the invention. Other 

1 5 decorrelation techniques as described below in connection with the examples of FIGS. 8 
and 9 may be employed instead of or in addition to the techniques of Table 1 . 

In practice, applying angle rotations and magnitude alterations may result in 
circular convolution (also known as cyclic or periodic convolution). Although, generally, 
it is desirable to avoid circular convolution, undesirable audible artifacts resulting from 

20 circular convolution are somewhat reduced by complementaiy angle shifting in an 

encoder and decode. In addition, the effects ofx^ircular convolution may be tolerated in 
low cost implementations of aspects of the present invention, particularly those in which 
the downmixing to mono or multiple channels occurs only in part of the audio frequency 
band, such as, for example above 1500 Hz (in which case the audible effects of drcular 

25 convolution are minimal). Alternatively, circular convolution may be avoided or 

minimized by any suitable technique, including, for example, an appropriate use of zero 
padding. One way to use zero padding is to transform the proposed frequency domain 
variation (representing angle rotations and amplitude scaling) to the time domain, window 
it (with an arbitrary window), pad it with zeros, then transform back to the frequmcy 

30 domain and multiply by the frequency domain version of the audio to be processed (the 
audio need not be windowed). 

Table 1 ^ 
Angle-Adjusting Decorrelation Techniques 
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Technique 1 


Technique 2 


Technique 3 


Type of Signal 1 
1 rtvpical example) | 


Spectrally static 
source 


Complex continuous 1 
signals 


Complex impulsive 1 
signals (transients) 


Effect on 
Decorrelation 


Decorrelates low 
frequency and 
steady-state signal 

components 


Decorrelates non- j 
impulsive complex 
signal components 


Decorrelates 1 
impulsive high 
frequency signal 
components 


Effect of transient 1 
present in frame 


Operates with 
shortened time 
constant 


Does not operate 1 


Operates 


Pwhat is done 


Slowly shifts j 
(frame-by-frame) 
bin angle in a 
channel 


Adds to the angle of 1 
Technique 1 a time^ 
invariant 

randomized angle 
on a bin-by-bin 
basis in a channel 


Adds to flie angle of 
Techmque 1 a 
rapidly-changing 
(block by block) 
randomized angle 
on a subband-by- 
subband basis in a 
channel 


Controlled by or 
Scaled by 


Basic phase angle is 
controlled by Angle 
Control Parameter 


Amount of 
randomized angle is 
scaled direcdy by 
Decorrelation SF; 
same scaling across 
subband, scaling 
updated every frame 


Amount or 
randomized angle is 
scaled indurectly by 
Decorrelation SF; 
same scaling across 
subband, scaling 
updated every frame 


1 Frequency 
Resolution of angle 
shift 


Subband (same or 
interpolated shift 
value applied to all 
bins in each 
subband) 


1 Bin (different 
randomized shift 
value appUed to 
each bin) 


Subband (same 
randomized shift ^ 
value applied to all 
bins in each 
subband; different 
randonuzea smn 
value applied to 
each subband in 
channel) 


Time Resolution 


Frame (shift values 
updated every 
frame) 


1 Randomized shift 

values remain the 
same and do not 


Block (randomized 
shift values updated 
every block) 



For signals that are substantially static spectrally, such as, for example, a pitch 
pipe note, a first techmque ('Technique 1") restores the angle of the received mono 
composite signal relative to the angle of each of the other recovered channels to an angle 
similar (subject to frequency and time granularity and to quantization) to the original 
angle of die channel relative to the other channels at the mput of the encoder. Phase angle 
differences are use&l. particularly, for providing decorrelation of low-frewency signal 
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components below about 1500 Hz where the ear follows individual cycles of the audio 
signal. Preferably, Technique 1 operates under all signal conditions to provide a basic 
angle sliift. 

For liigh-frequency signal components above about 1500 Hz, the ear does not 
5 follow individual cycles of sound but instead responds to waveform envelopes (on a 
critical band basis). Hence, above about 1500 Hz decorrelation is better provided by 
differences in signal envelopes rather than phase angle differences. Apply mg phase angle 

■ 

shifts only in accordance with Technique 1 does not alter the envelopes of signals 
sufficiently to decorrelate high frequency signals. The second and third techniques 

10 ('Technique 2" and 'Technique 3", respectively) add a controllable amount of 

randomized angle variations to the angle determined by Technique 1 under certain signal 
conditions, thereby causing a controllable amount of randomized envelope variations, 
which enhances decorrelation. 

Randomized changes in phase angle are a desirable way to cause randomized 

15 changes in the envelopes of signals. A particular envelope results from the interaction of 
a particular combmation of amplitudes and phases of spectral components within a 
subband. Although changing the amplitudes of spectral components within a subband 
changes the envelope, large amplitude changes are required to obtam a significant change 
in the envelope, which is undesirable because the human ear is sensitive to variations in 

20 spectral amplitude. In contrast, changing the spectral component's phase angles has a 
greater effect on the envelope than changing the spectral component's ampUtudes — 
spectral components no longer line up the same way, so the reinforcements and 
subtractions that define the envelope occur at different times, thereby changing the 
envelope. Although the human ear has some envelope sensitivity, the ear is relatively 

25 phase deai^ so the overall sound quality remains substantially similar. Nevertheless, for 
some signal conditions, some randomization of the amplitudes of spectral components 
along with randomization of the phases of spectral components may provide an enhanced 
randomization of signal envelopes provided that such amplitude randomization does not 
cause undesirable audible artifacts. 

30 Preferably, a controllable amount or degree of Technique 2 or Technique 3 

• • * • 

operates along with Tedmique 1 under 'certain signal conditions. The Transient Flag 
selects Technique 2 (no transient present in the frame or block, depending on whether tiie 



1 
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Transient Flag is seat at the frame or block rate) or Technique 3 (transient present in the 

s 

frame or block). Thus, there are multiple modes of operation, depending on whether or 
not a transient is present. Alternatively, in addition, under certain signal conditions, a 
controllable amount or degree of amplitude randomization also operates along with the 
5 amplitude scaling that seeks to restore the original channel amplitude. 

Technique 2 is suitable for complex continuous signals that are rich in hannonics, 
such as massed orchestral violins. Technique 3 is suitable for complex impulsive or 
transient signals, such as applause, castanets, etc. (Technique 2 time smears claps in 
q)plause, making it unsuitable for such signals). As explained ftirfher below, in order to 
1 0 minimize audible artifacts. Technique 2 and Technique 3 have different time and 
frequency resolutions for applying randomized angle variations — Technique 2 is 
selected when a transient is not present, whereas Technique 3 is selected when a transient 
is present 

Technique 1 slowly shifts (frame by frame) the bin angle in a channel. The 

1 5 amount or degree of this basic shift is controlled by the Angle Control Parameter (no shift 
if the parameter is zero). As explained fiirther below, either the same or an interpolated 
parameter is applied to all bins in each subband and the parameter is updated every frame. 
Consequently, each subband of each channel may have a phase shift with respect to other 
channels, providing a degree of decorrelation at low frequencies (below about ISOO Hz). 

20 However, Technique 1, by itself is unsuitable for a transient signal such as applause. For 
such signal conditions, the reproduced channels may exhibit an annoying unstable comb- 
filter effect. In ftie case of applause, essentially no decorrelation is provided by adjusting 
only the relative amplitude of recovered channels because all channels tend to have the 
same amplitude over the period of a frame. 

25 Technique 2 operates when a transi^t is not present. Technique 2 adds to the 

angle shift of Technique 1 a randomized angle shift that does not change with time, on a 
bin-by-bin basis (each bin has a different randomized shift) in a diannel, causing the 
ravelopes of the channels to be different from one another, thus providing decorrelation 
of complex signals among the channels. Maintaining the randomized phase angle values 

30 constant over time avoids blodc or frame artifacts that may result from block-to-block or 
frame-to-fi-ame alteration of bin phase angles. While this technique is a very usefiil 
decorrelation tool when a transient is not present, it may temporally smear a^ansient 
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(resulting in what is often referred to as '*pre-noise" - the post-transient smearing is 
masked by the transient). The amount or degree of additional shift provided by 
Technique 2 is scaled dkectly by the Decorrelation Scale Factor (there is no additional 
sliift if the scale factor is zero). Ideally, the amount of randomized phase angle added to 
the base angle shift (of Technique 1) according to Technique 2 is controlled by the 
Decorrelation Scale Factor in a manner that minimizes audible signal warbling artifacts. 
Sudi mimmization of signal warbling artifacts results from the manner in which tlie 
Decorrelation Scale Factor is derived and the application of af)propriate time smoothing, 
as descaibed below. Although a different additional randomized angle shift value is 
appUed to each bin and that shift value does not change, the same scaling is applied 
across a subband and the scaling is updated every.frame. 

Technique 3 operates in the presmce of a transient in the frame or block, 
depending on the rate at which the Transient Flag is sent. It shifts all the bins in each 
subband m a channel from block to blodc with a unique randomized angle value, common 
to all bins m the subband, causmg not only the envelopes, but also the amplitudes and 
phases, of the signals in a channel to diange with respect to other channels from block to 
block. These changes in time and frequency resolution of the angle randomizmg reduce 
steady-state signal similarities among the channels and provide decorrelation of the 
channels substantially without causing "pre-noise'* artifacts. The change in frequency 

* 

resolution of the angle randomizing, from very fine (all bms differmt m a channel) in 
Technique 2 to coarse (all bins within a subbaod the same, but each subband different) in 
Technique 3 is particularly useful in minimizing "pre-noise" artifacts. AHhough the ear 
does not respond to pure angle dianges directly at high frequencies, when two or more 
channels mix acoustically on their way from loudspeakCTS to a listener, phase diffCTcnces 
may cause ampUtude changes (comb-filter effects) that may be audible and objectionable, 
and these are broken up by Technique 3. The impulsive characteristics of the signal 
minimize block-rate artifacts that might otherwise occur. Thus, Technique 3 adds to the 
phase shift of Technique 1 a r^idly changing (block-by-block) randomized angle shift 
on a subband-by-subband basis in a channel. The amount or degree of additional shift is 
scaled indirectly, as described below, by the Decorrelation Scale Factor (tiiere is no 
additional shift if the scale fector is zero). The same scaling is appUed across a subband 
. and the scaling is iq)dated every frame. 
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Although the angle-adjusting techniques have been characterized as three 
techniques, this is a matter of semantics and they may also be characterized as two 
tecliniques: (1) a combination of Technique 1 and a variable degree of Technique 2, 
which may be zero, and (2) a combination of Technique 1 and a variable degree 
5 Technique 3, which may be zero. For convenience in presentation, the techniques are 
heated as being tlu'ee techniques. 

Aspects of the multiple mode decorrelation techniques and modifications of tliem 
may be employed in providing decorrelation of audio signals derived, as by upmixing, 
from one or more audio channels even when such audio channels are not derived from an 

10 encoder according to aspects of the present invention. Such arrangements, when applied 
to a mono audio channel, are sometimes referred to as "pseudo-stereo" devices and 
fimctions. Any suitable device or fiinction (an "upmixer*') may be employed to derive 
multiple signals from a mono audio channel or from multiple audio channels. Once such 
multiple audio channels are derived by an upmixer, one or more of them may be 

15 decorrelated with respect to one or more of the other derived audio signals by applying 
the multiple mode decorrelation techniques described h^ein. In such an application, each 
derived audio channel to which the decorrelation techniques are applied may be switched 
from one mode of operation to another by detecting transients in the derived audio 
channel itself. Alternatively, the operation of the transient-present technique (Technique 

20 3) may be simplified to provide no shifting of the phase angles of spectral components 
when a transient is present 

Sidechain Information 
As mentioned above, the sidechain information may include: an Amplitude Scale 
Factor, an Angle Control Parameter, a Decorrelation Scale Factor, a Transient Flag, and,. 
25 optionally, an Interpolation Flag. Such sidechain information for a practical embodiment 

* 

of aspects of the present invention may be summarized in the following Table 2. 
Typically, the sidechain information may be updated once per frame. 



Table 2 

Sidechain Information Characteristics for a Channel 



Sidechain 
hiformation 


Value Range 


Represfflts 
(is "a measure 
of) 


Quantization 
Levels 


Primary 
Purpose 


Subband Angle 
Control 

• 

Parameter 


0 ^+271 


Smoothed time 
average in each 
subband of 


6 bit (64 levels) 


Provides 
^asic angle 
rotation for 
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Sidechain 




Represents 


Quantization 


Primary 


Itiformation 


Value Range 


(is "a measure 
of) 


Levels 


Purpose 






difference 




each bin in 






between angle of 


• 


channel 






each bin in 


- 








subband for a 










channel and that 










of the 








■ 


corresponding bin 


• 








in subband of a 










reference channel 






Subband 


0 ->1 


Spectral- 


3 bit (8 levels) 


Scales 


Decorrelation 


The Subband 


steadiness of 




randomized 


Scale Factor 


Decorrelation 


signal 




angle shifts 




Scale Factor is 


characteristics 




added to 




high only if 


over time in a 




basic angle 




both the 


subband of a 




rotation, and. 




Spectral- 


channel (the 




if employed. 




Steadiness 


Spectral- 


> 


also scales 




Factor and the 


Steadiness 




randomized 




Interchannel 


Factor) and the 




Amplitude 




Angle 


consistency in the 




Scale Factor 




Consistency 


same subband of 




added to 


• 


Factor are low. 


a channel of bin 

angles with 
respect to 
corresponding 
bins of a 

reference channel 
(the Intercfiajonel 
Angle 
Consistmcy 
Factor) 




basic 

Amplitude 
Scale Factor, 
and, 

optionally, 
scales degree 
of 

reverberation 


Subband 


0 to 3 1 (whole 


Energy or 


5 bit (32 levels) 


Scales 


Amplitude 


integer) 


amplitude m 


Granulanty is 


amplitude of 


Scale Factor 


0 is highest 


subband of a 


1.5 dB, so the 


bins in a 




amplitude 


chamiel with 


range is 31*1.5 = 


subband in a 




3 1 is lowest 


respect to energy 


46.5 dB plus 


channel 




amplitude 


or amplitude for 
same subband 
aax)ssall 
channels 


final value = off. 


• 
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Sidechain 
InfonnatioQ 


Value Range 


Represents 
(is "a measure 
ofO 


Quantization 
Levels 


Primary 
Purpose 


Transient Flag 


1,0 

(True/False) 
(polarity is 
arbitrary^ 


Presence of a 
transient in the 
frame or in the 
block 


1 bit (2 levels) 

M 

\ 


Determines 
which 

technique for 

randomized 
angle shifts, 
or both angle 
shifts and 
amplitude 
shifts, is 
employed 


Interpolation 
Flag 


1,0 

(True/False) 
(polarity is 
arbitrary) 


A spectral peak 
near a subband 
boundary or 
phase angles 
within a chaimel 
have a linear 
progression 


1 bit (2 levels) 


Determines 

if the basic 

angle 

rotation is 

interpolated 

across 

frequency 



In each case, die sidechain information of a chaimel applies to a single subband 
(except for the Transient Flag and the Inteipolation Flag, each of which apply to all 
subbands in a chatmei) and may be updated once per frame. Although the time resolution 
5 (once per frame), frequency resolution (subband), value ranges and quantization levels 
indicated hav6 been found to provide usefiil performance and a usefid compromise 
between a low bitrate and performance, it will be iiQpreciated that these time and 
frequency resolutions, value ranges and quantization levels are not critical and that other 
resolutions, ranges and levels may employed in practicing aspects of the invention. For 

10 example, the Transient Flag and/or the Interpolation Flag, if employed, may be updated 
once per block with only a minimal increase in sidechain data overhead. In the case of 
the Transient Flag, doing so has the advantage that the switching from Technique 2 to 
Technique 3 and vice-versa is more accurate. In addition, as mentioned above, sidechain 
information may be updated upon the occurrence of a block switch of a related coder. 

15 It will be noted that Technique 2, desoibed above (see also Table 1), provides a 

bin frequency resolution rather than a subband frequency resolution (i.e., a different 
pseudo random phase angle shift i$ applied to egpli \fxn rather than to each subband) even 
though the same Subband Decoitelation Scale Factor applies to all bins in ai^bband. It 
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will also be noted that Technique 3, described above (see also Table 1), provides a block 
frequency resolution (i.e., a different randomized phase angle shift is apphed to each 
block rather than to each frame) even though the same Subband Decorrelation Scale 
Factor applies to all bins in a subband. Such resolutions, greater than the resolution of the 

5 sidechain information, are possible because the randomized phase angle shifts may be 
generated in a decoder and need not be known in the encoder (tliis is the case even if the 
encoder also applies a randomized phase angle shift to the encoded mono composite 
signal, an alternative that is described below). In other words, it is not necessary to send 
sidechain information having bin or block granularity even though the decorrelation 

10 techniques employ such granularity. The decoder may employ, for example, one or more 
lookup tables of randomized bin phase angles. The obtaining of time and/or frequency 
resolutions for decorrelation greater than the sidechain information rates is among the 
aspects of the present invention. Thus, decorrelation by way of randomized phases is 
performed either with a fine frequency resolution (bin-by-bin) that does not change with 

1 5 time (Technique 2), or with a coarse frequency resolution (band-by-band) ((or a fine 

frequency resolution (bin-by-bin) when frequency interpolation is employed, as described 
fiirther below)) and a fine tune resolution (block rate) (Technique 3). 

It will also be appreciated that as increasing degrees of randomized phase shifts 
are added to the phase angle of a recovered channel, the absolute phase angle of the 

20 recovered channel differs more and more from the original absolute phase angle of that 
channel. An aspect of the presait invention is Ae appreciation that the resulting absolute 
phase angle of the recovered channel need not match that of the original channel when 
signal conditions are such that the randomized phase shifts are added in accordance with 
aspects of the present invention. For example, in extreme cases when the Decorrelation 

25 Scale Factor causes the highest degree of randomized phase shift, the phase shift caused 
by Technique 2 or Technique 3 overwhehns the basic phase shift caused by Technique 1. 
Nevertheless, this is of no concern in that a randomized phase shift is audibly the same as 
the diffeent random phases in the original signal that give rise to a Decorrelation Scale 
Factor that causes the addition of some degree of randomized phase shifts. 

30 As mentioned above, randomized amplitude shifts may by employed in addition to 

randomized phase shifts. For example, the Adjust AmpUtude may also be controUed by a 
, Randomized Amplitude Scale Factor Parameter daived from the recoy^ed sidechain 
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Decorrelatioa Scale Factor for a particular channel and the recovered sidechain Transient 
Flag for the particular channel. Such randomized amplitude shifts may operate in two 
modes in a manner analogous to the application of randomized phase shifts. For example, 
in the absence of a transient, a randomized amplitude shift that does not change with time 
5 may be added on a bin-by-bin basis (different fiom bin to bin), and, in tlie presence of a 
transient (in the frame or block), a randomized amplitude shift that changes on a block- 
by-block basis (different from block to block) and changes from subband to subband (tlie 
same shift for all bins in a subband; different from subband to subband). Although the 
amount or degree to which randomized amplitude shifts are added may be controlled by • 

10 the Decorrelation Scale Factor, it is believed that a particular scale factor value should 
cause less amplitude shift than the corresponding randomized phase shift resulting from 
the same scale factor value in order to avoid audible artifacts. 

When the Transient Flag applies to a frame, the time resolution with which the 
Transient Flag selects Technique 2 or Technique 3 may be enhanced by providing a 

1 5 supplemental transient detector in the decoder in order to provide a temporal resolution 
finer than the fii^me rate or even the block rate. Such a supplemental transient detector 
may detect the occurrence of a transient in the mono or multichannel composite audio 
signal received by the decoder and such detection information is then sent to each 
Controllable Decorrelator (as 38, 42 of FIG. 2). Then, upon the receipt of a Transient 

20 Flag for its channel, the Controllable Decorrelator switches from Technique 2 to 

Technique 3 upon receipt of the decoder's local tnmsient detection indication. Thus, a 
substantial improvement in temporal resolution is possible without increasing the 
sidechain bitrate, albeit with deoreased spatial accuracy (the encoder detects transients in 
each input channel prior to their dowmnixing, whereas, detection in the decoder is done 

25 after downmixing). 

As an altafnative to sending sidechain information on a frame-by-fitime basis, 
sidechain information may be updated every block, at least for highly dynamic signals. 
As mentioned above, updating the Transiat Flag and/or the Interpolation Flag every 
block results in only a sn\all increase in sidechain data overhead. In order to accomplish 

* 

30 such an increase in temporal resolution for other sidechain information without 

substantially increasing the sidechain data rate, a block-floating-point differential coding 
arrangement may be used. For example, consecutive transform blocks may^e collected 
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in groups of six over a frame. The full sidechain information may be sent for each 
subband-channel in tbe first block. In the five subsequent blocks, only differential values 
may be sent, each die difference between the current-block amplitude and angle, and the 
equivalent values from the previous-block. This results in very low data rate for static 
signals, such as a pitch pipe note. For more dynamic signals, a greater range of difference 
values is required,' but at less precision. So, for each group of five differential values, an 
exponent may be sent first, using, for example, 3 bits, then differential values are 
quantized to, for example, 2-bit accuracy. This arrangement reduces the average worst- 
case sidechain data rate by about a factor of two. Further reduction may be obtained by 
omitting the sidechain data for a reference channel (since it can be derived from the other 
channels), as discussed above, and by using, for example, arithmetic coding. 
Alternatively or ia addition, differential coding across frequency may be employed by 
sending, for example, differences in subband angle or amplitude. 

Whether sidechain information is sent on a frame-by-frame basis or more 
frequentiy, it may be usefiil to interpolate sidechain values across the blocks in a firame. 
Linear interpolation over time may be employed in the manner of flie linear interpolation 
across frequency, as described below. 

One suitable implementation of aspects of the present invmtion employs 
processing steps or devices that unplraient the respective processing steps and are 
fimctionally related as next set forth. Although tiie encoding and decoding steps Usted 
below may eadi be carried out by computer sbftware instraction sequences operating in 
the order of the below Usted stq)s, it wiU be understood that equivalent or similar results 
may be obtained by steps ordered in other ways, takmg into account that certain quantities 
are derived from earKer ones. For example, multi-threaded computer software instruction 
sequences may be employed so that certain sequences of steps are carried out in parallel. 
Alternatively, the described steps may be unplemented as devices tiiat perform the 
described functions, the various devices having functions and functional interrelationships 
as described hereinafter. 

Encoding 

The encoder or encoding function may collect a frame's worth of data before it 
derives sidedjain mformation and downmixes the frame's audio channels to a single 
monophonic (mono) audio channel (in the manner of the example of F^. 1, described 
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above), or to multiple audio channels (in the manner of the example of FIG. 6, described 
below). By doing so, sidechain infonnation may be sent first to a decoder, allowing tlie 
decoder to begin decoding immediately upon receipt of the mono or multiple channel 
audio information. Steps of an encoding process ("encoding steps") may be described as 
follows. With respect to encoding steps, reference is made to FIG. 4, which is in the 
nature of a hybrid flowchart and functional block diagram. Through Step 419, FIG. 4 
shows encoding steps for one channel. Steps 420 and 421 apply to all of the multiple 
channels that are combined to provide a composite mono signal output or aie matrixed 
together to provide multiple channels, as described below in connection with tiie example 

of FIG. 6. 

Step 401, Detect Transients 

a. Perform transient detection of the PCM values in an input audio channel. 

b. Set a one-bit Transient Flag True if a transient is present in any block of a frame 
for the channel. 

Comments regarding Step 401: 

Tlie Transient Flag forms a portion of the sidechain information and is also used 
in Step 411, as described below. Transient resolution finer than block rate in the decoder 
may improve decoder performance. Although, as discussed above, a block-rate rattier . 
than a frame-rate Transient Flag may form a portion of the sidechain information with a 
modest increase in bitrate, a similar result, albeit wifli deceased spatial accuracy, may be 
accomplished without increasing the sidechain biteate by detecting the occurrence of 
transients in the mono composite signal received in the decoder. 

There is one transient flag per channel per frame, which, because it is derived in 
the tune domain, necessarily appUes to all subbands withm that channel. The transient 
detection may be performed in the manner similar to that employed in an AC-3 encoder 
for controllmg the decision of when to switch between long and short length audio 
blocks, but witii a high^ sensitivity and with flie Transient Flag True for any frame in 
which the Transient Flag for a block is True (an AC-3 encoder detects transients on a 
block basis). In particular, see Section 8.2.2 of tiie above-cited A/52A document. The 
sensitivity of the transient detection described in Section 8.2.2 may be increased by 
adding a sensitivity factor F to an equation set forth therein. Section 8.2.2 of the A/52A 
doQument is set forfli below, with tiie sensitivity factor added (Section 8.2.2^ reproduced 



« 
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below is corrected to indicate tliat the low pass filter is a cascaded biquad direct fonn n 
IIR filter rather than "form T' as in the published A/52A document; Section 8.2.2 was 
correct in the earlier A/52 document). Although it is not critical, a sensitivity factor of 
0.2 has been found to be a suitable value in a practical embodiment of aspects of the 
5 present uivention. 

Alternatively, a similar transient detection technique described in U.S. Patent 
5,394,473 may be employed. The '473 patent describes aspects of the A/52A document 
transient detector in greater detail. Both said A/52A document and said *473 patent are 
hereby incorporated by reference in their entirety. 
10 As another alternative, transients may be detected in the frequency domain rather 

than in the time domain (see the Comments to Step 408 ). In that case. Step 401 may be 
omitted and an alternative step employed in the frequency domain as described below. 

Step 402. Window and DFT. 

Multiply overlapping blocks of PCM time samples by a time window and convert 
15 them to complex frequency values via a DFT as implemented by an FFT. 

Step 403. Convert Complex Values to Magnitude and Angle. 
Convert each frequ^cy-domain complex transform bin value (a + jb) to a 
magnitude and angle representation using standard complex manipulations: 
a. Magnitude = square_root (a^ + b^) 
20 b. Angle = arctan (b/a) 

Comments regarding Step 403: 

Some of the following Steps use or use, as an alternative, flie energy of a bin, 
defined as the above magnitude squared (Le,, energy = (a^ + b^). 
Step 404. Calculate Subband Energy. 
25 a. Calculate the subband energy per block by adding bin energy values within 

each subband (a summation across frequency). 

b. Calculate the subband energy per frame by averaging or accumulating the 
energy in all the blocks in a frame (an averaging / accumulation across time). 

c. If the coupling frequency of the encoder is below about 1000 Hz, apply the 
30 subband frame-averaged or frame-accumulated energy to a time smoother that operates 

on aU subbands below that frequency and above the coupling frequency. 
Comments regarding Step 404c: ^ 



* 
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Time smoofhing to provide inter-firame smoothing in low frequency subbands may 
be useful. In order to avoid artifact-causing discontinuities between bin values at subband 
boundaries, it may be useful to apply a progressively-decreasing time smoothing from the 
lowest frequency subband encompassing and above the coupling frequency (where the 
S smoothing may have a significant effect) up tlirough a higher frequency subband in wliich 
the time smoothing effect is measurable, but inaudible, although nearly audible. A 
suitable time constant for the lowest frequency range subband (where the subband is a 
single bin if subbands are critical bands) may be in the range of 50' to 100 miUiseconds, 
for example. Progressively-decreasing time smoothing may continue up through a 

10 subband encompassing about 1000 Hz where the time constant may be about 10 
milliseconds, for example. 

Although a first-order smoother is suitable, the smoother may be a two-stage 
smoother ttiat has a variable time constant that shortens its attack and decay time in 
response to a transient (such a two-stagp smoother may be a digital equivalent of the 

15 analog two-stage smoothers described in U.S. Patents 3,846,719 and 4,922,535, each of 
which is hereby incorporated by refer^ce in its entfrety). In other words, the steady-state 
time constant may be scaled according to frequency and may also be variable in response 
to transirats. Alternatively, such smoothing may be apphed in Step 412. 
Step 405. Calculate Sum of Bin Magnitudes. 

20 a. Calculate the sum per block of the bin magnitudes (Step 403) of each subband 

(a summation across frequency). 

b. Calculate the sum per frame of the bin magnitudes of each subband by 
averaging or accumulating the magnitudes of Step 405a across the blocks in a frame (an 
averaging / accumulation across time). These smns are used to calculate an Interchannel 

25 Angle Consistency Factor in Step 410 below. 

c. If the coupling frequency of the encoder is below about 1000 Hz, apply the 
subband frame-averaged or frame-accumulated magnitudes to a time smoother that 
operates on all subbands below that frequency and above the coupling frequency. 

Comments regarding Step 405c: See comments regarding step 404c except that 
30 in the case of Step 405c, the time smoothing may alternatively be performed as part of 
Step 410. - 

Step 406. Calculate Relative Interchannel Bin Phase Angle. T 
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Calculate the relative interchaimel phase angle of each transform bin of each block 
by subtracting from the bin angle of Step 403 the corresponding bin angle of a reference 
channel (for example, the first channel). The result, as with other angle additions or 
subtractions herein, is taken modulo (ti, -n) radians by adding or subtracting 2n until the 

« 

result is within the desired range of -tc to -he. 

Step 407. Calculate Interchannel Subband Phase Angle. 

For each channel, calculate a frame-rate amplitude-weighted average interchannel 
phase angle for each subband as follows: 

a. For each bin, construct a complex number from the magnitude of Step 403 
and the relative interchannel bin phase angle of Step 406. 

b. Add the constructed complex numbers of Step 407a across each subband (a 

smnmation across frequency). 

Comment regarding Step 407b: For example, if a subband has two bins and 
one of the bins has a complex value of 1 + j 1 and the other bin has a complex 
value of 2 + j2, their complex sum is 3 + j3, 

c. Average or accumulate the per block complex numb^ sum for each 
subband of Step 407b across the blocks of each frame (an averaging or 
accumulation across time). 

d. If the coupling frequency of the encoder is below about 1000 Hz, apply the 
subband frame-averaged or frame-accumulated complex value to a time smoother 
that operates on all subbands below that-frequency and above the coupling 
frequency. 

Comments regarding Step 407d: See comments regarding Step 404c except 
that m the case of Step 407d, the tune smoothing may alternatively be performed 
as part of Steps 407e or 410. 

e. Compute flie magnitude of the complex result of Stqp 407d as per Step 403. 
Comment regarding Step 407e: This magnitude is used in Step 410a below. 

In the simple example given in Step 407b, the magnitude of 3 + j3 is square_root 
(9 + 9) = 4.24. 

f. Compute the angle of the complex result as per Step 403. 

Comments regarding Step 407f : In the simple example given in Step 407b, 
the angle of 3 + j3 is arctan (3/3) = 45 degrees = n/4 radians. Tifis subband angle 
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is signal-^iependently time-smoothed (see Step 413) and quantized (see Step 414) 
to generate the Subband Angle Control Parameter sidechain information, as 
described below. 

Step 408. Calculate Bin Spectral-Steadiness Factor 

For each bin, calculate a Bin Spectral-Steadiness Factor in the range of 0 to 1 as 
follows: 

a. Let Xm = bin magnitude of present block calculated in Step 403. 

b. Let ym = corresponding bin magnitude of previous block. 

c. If Xm > Ym, then Bin Dynamic Amplitude Factor = (ym/Xm) ; 

d. Else if Ym > Xm, then Bin Dynamic Amplitude Factor = (Xn,/ ym) . 

e. Else if Ym = Xm, then Bin Spectral-Steadiness Factor = 1. 
Comment regarding Step 408: 

"Spectral steadiness" is a measure of the extent to which spectral components 
(e.g.. spectral coefficients or bin values) change over time. A Bin Spectral-Steadiness 
Factor of 1 indicates no change over a given time period. 

Spectral Steadiness may also be taken as an indicator of whether a transient is 
present A transient may cause a sudden rise and fall in spectral (bin) amplitude over a 
time period of one or more blocks, depending on its position wifli regard to blocks and 
their boundaries. Consequently, a change in the Bin Spectral-Steadiness Factor from a 
high value to a low value over a small number of blodcs may be taken as an indication of 
the presence of a transient in the block or blocks having the lower value. A further 
confirmation of the presence of a transient, or an alternative to employing the Bin 
Spectral-Steadiness factoy, is to observe the phase angles of bins within the block (for 
example, at flie phase angle output of Step 403). Because a transient is likely to occupy a 
single temporal position within a block and have the dominant energy in the block, the 
existence and position of a transient may be indicated by a substantially uniform delay in 
phase from bin to bin in the block - namely, a substantially Imear ramp of phase angles as 
a function of frequency. Yet a further confirmation or alternative is to observe the bin 
amplitudes over a smaU number of blocks (for example, at tiie magnitude output of Step 
403), namely by looking direcdy for a sudden rise and fall of spectral level. 

Alternatively, Step 408 may look at three consecutive blocks instead of one block. 
If , the coupling frequency of the encoder is below about 1000 Hz, Step 408j«iay look at 
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more than three consecutive blocks. Hie number of consecutive blocks may taken into 
consideration vary with frequency such that the number gradually increases as the 
subband frequency range decreases. If the Bin Spectral-Steadmess Factor is obtained 
from more than one block, the detection of a transient, as just described, may be 
determined by separate steps that respond only to the number of blocks useful for 
detecting transients. 

As a further alternative, bin energies may be used instead of bin magnitudes. 

t 

As yet a further altemative, Step 408 may employ an "event decision" detecting 
technique as described below in the comments following Step 409. 

Step 409. Compute Subband Spectral-Steadiness Factor. 

Compute a frame-rate Subband Spectral-Steadiness Factor on a scale of 0 to 1 by 
forming an amplitude-weighted average of the Bin Spectral-Steadiness Factor within each 
subband across the blocks in a frame as follows: 

a. For each bin, calculate the product of the Bin Spectral-Steadiness Factor of Step 
408 and the bin magnitude of Step 403. 

b. Sum the products within each subband (a summation across frequency). 

c. Average or accumulate the summation of Step 409b in all the blocks in a frame 
(an averaging / accumulation across time). 

d. If the coupling frequency of the encoder is below about 1000 Hz, apply the 
subband frame-averaged or frame-accumidated summation to a time smoother that 
operates on all subbands below that frequeaiisy and above the coupling frequency. 

Comments regarding Step 409d: See comments regarding Step 404c except that 
in die case of Step 409d, thore is no suitable subsequent step in whidi the time 
smoothing may alternatively be performed. 

e. Divide the results of Step 409c or Step 409d, as appropriate, by the sum of the 
bin magnitudes (Step 403) widiin the subband. 

Comment regarding Step 409e: The multiplication by the magnitude in Step 
409a and the division'by the sum of the magnitudes in Step 409e provide amplitude 
weighting. The output of Step 408 is independent of absolute amplitude and, if not 
amplitude weighted, may caiuse the output or Step 409 to be controlled by very small 
amplitudes, which is undesirable. 

f. Scale the result to obtain the Subband Spectral-Steadmess Factor by mapping 



wo 2005/086139 PCT/US2005/006359 

-33- 

the range from {0.5...1} to {0...1}. This may be done by multiplying tlie result by 2, 
subtracting 1, and limiting results less than 0 to a value of 0. 

Comment regarding Step 409f: Step 409f may be useful in assuring that a 
channel of noise results in a Subband Spectral-Steadiness Factor of zero. 

5 Comments regarding Steps 408 and 409: 

The goal of Steps 408 and 409 is to measure spectral steadiness — changes in 
spectral composition over time in a subband of a channel. Alternatively, aspects of an 
"event decision" sensing such as described in International Publication Number WO 
02/097792 Al (designating the United States) may be employed to measure spectral 

10 steadiness instead of the approach just described in connection with Steps 408 and 409. 
U.S. Patent Application S.N. 10/478,538, filed November 20, 2003 is the United States' 
national application of the published PCT AppUcation WO 02/097792 AL Both the 
published PCT application and the U.S. application are hereby incorporated by reference 
in their entirety. According to these incorporated applications, the magnitudes of the 

1 5 complex FFT coefficient of each bin are calculated and normalized Gargest magnitude is 
set to a value of one, for example). Then the magnitudes of corresponding bins (in dB) in 
consecutive blocks are subtracted (ignoring signs), the differences between bins are 
summed, and, if the sum exceeds a threshold, the block boundary is considered to be an 
auditory event boundary. Alternatively, changes in amplitude from block to block may 

20 also be considered along with spectral magnitude changes (by looking at the amount of 
normalization required). ^.^^ 

If aspects of the incorporated evmt-sensing applications are employed to measure 
spectral steadiness, normalization may not be required and the changes in spectral 
magnitude (dianges in amplitude would not be measured if normalization is omitted) 

25 preferably are considered on a subband basis. Instead of performing Step 408 as 

indicated above, the decibel differences in spectral magnitude between corresponding 
bins in each subband may be summed in accordance with the teachings of said 
appUcations. Then, each of those sums, representing the degree of spectral change from 
block to block may be scaled so that the result is a spectral steadiness factor having a 

30 range from 0 to 1 , wherein a value of 1 indicates the highest steaduiess, a change of 0 dB 
from block to block for a given bin. A value of 0, indicating the lowest steadiness, may 
be assigned to decibel changes equal to or greater than a suitable amount, swh as 12 dB, 
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for example. These results, a Bin Spectral-Steadiness Factor, may be used by Step 409 in 
the same manner that Step 409 uses the results of Step 408 as described above. When 
Step 409 receives a Bin Spectral-Steadiness Factor obtained by employing the just- 
described alternative event decision sensing technique, the Subband Spectral-Steadiness 
Factor of Step 409 may also be used as an indicator of a transient. For example, if the 
range of values produced by Step 409 is 0 to 1, a transient may be considered to be 
present when the Subband Spectral-Steadiness Factor is a small value, such as, for 
example, 0. 1 , indicating substantial spectral unsteadiness. 

It will be appreciated that the Bin Spectral-Steadiness Factor produced by Step 
408 and by the just-described altOTiative to Step 408 each inherently provide a variable 
threshold to a certain degree in that they are based on relative changes from block to 
block. Optionally, it may be useful to supplement such inherency by specifically 
providing a shift in the threshold in response to, for example, multiple transients in a 
frame or a large transient among smaller transients (e.g.j a loud transient coming atop 
mid- to low-level applause), hi the case of the latter example, an event detector may 
mitially identify each clap as an event, but a loud transient (eg., a drum hit) may make it 
desirable to shift the threshold so that only the drum hit is identified as an event. 

Alternatively, a randomness metric may be employed (for example, as described 
m U.S. Patent Re 36,714, which is herAy mcoiporated by reference in its entirety) 
instead of a measure of spectral-steadiness over tune. 

Step 410. Calculate Interchannel Ajigle Consistency Factor. 

For each subband having more flian one bin, calculate a firame-rate Interchannel 
Angle Consistenqr Factor as follows: 

a. Divide the magnitude of the complex sum of Step 407e by the sum of the 
magnitudes of Step 405. The resultmg "raw" Angle Consistency Factor is a 

number in the range of 0 to 1. 

b. Calculate a correction factor: let n = the number of values across the 
subband contributmg to the two quantities m the above step (in other words, "n" is 
the number of bms in the subband). If n is less than 2, let the Angle Consistency 
Factor be 1 and go to Steps 41 1 and 413. 

c. Let r = Expected Random Variation = 1 /n. Subtract r &om the result of the 

c 

Step 410b. ^ 



I 
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d. Normalize the result of Step 41 Oc by dividing by (1 - r). The result has a 
maximum value of 1. Limit the minimum value to 0 as necessary. 
Comments regarding Step 410: 

Interchaimel Angle Consistency is a measure of how similar the interchannel 
5 phase angles are within a subband over a frame period. If all bin interchannel angles of 
the subband are the same, the Interchannel Angle Consistency Factor is 1 .0; whereas, if 
tlie intercharmel angles are randomly scattered, tlie value approaches zero. 

The Subband Angle Consistency Factor indicates if there is a phantom image 
between the channels. If the consistency is low, then it is desirable to decorrelate the 
10 chaimels. A high value indicates a fiised image. Image fusion is independent of other 
signal characteristics. 

It will be noted that the Subband Angle Consistency Factor, although an angle 
parameter, is determined indirectly from two magnitudes. If the interdiannel angles are 
all the same, adding the complex values and then taking the magnitude yields the same 
15 result as taking all the magnitudes and adding them, so the quotient is L If the 

interchannel angles are scattered, adding the complex values (such as adding vectors 
having diffident angles) results in at least partial cancellation, so the magnitude of the 
sum is less than the sum of the magnitudes, and the quotimt is less than 1. 
Following is a simple example of a subband having two bins: 
20 Suppose that the two complex bin values are (3 + j4) and (6 + j8). (Same angle 

each case: angle = arctan (imag/real), so anglel = arctan (4/3) and angle2 = arctan (8/6) = 
atctan (4/3)). Adding complex values, sum = (9 + jl2), magnitude of which is 
sqiiarejroot (81+144) = 15. 

The sum of the magnitudes is magnitude of (3 + j4)+magnitude of (6 + jS) = 5 + 
25 10=15. The quotient is therefore 15/15 = 1 = consistency (before 1/n normalization, 
would also be 1 after normalization) (Normalized consistency = (1 - 0.5) / (1 - 0,5) = 1.0). 

If one of the above bins has a different angle, say that the second one has complex 
value (6 -j 8), which has the same magnitude, 10. The complex sum is now (9 - j4), 
which has magnitude of squarejroot (81 + 16) - 9.85, so the quotient is 9.85 / 15 == 0.66 = 
30 consistency (before normalization). To normalize, subtract 1/n = 1/2, and divide by (1- 
1/n) (normalized consistency = (0,66 - 0.5) / (1 - 0.5) = 0.32.) 
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Although the above-described technique for determining a Subband Angle 
Consistency Factor has been found useful, its use is not critical. Other suitable techniques 
may be employed. For example, one could calculate a standard deviation of angles using 
standard formulae. In any case, it is desirable to employ amplitude weighting to 
5 minimize the effect of small signals on the calculated consistency value. 

hi addition, an alternative derivation of the Subband Angle Consistency Factor 
may use energy (the squares of the magnitudes) instead of magnitude. This may be 
accomplished by squaring the magnitude from Step 403 before it is applied to Steps 405 
and 407. 

10 Step 411. Derive Subband Decorrelation Scale Factor. 

Derive a frame-rate Decorrelation Scale Factor for each subband as follows: 

a. . Let X = frame-rate Spectral-Steadiness Factor of Step 409f. 

b. Let y = frame-rate Angle Consistency Factor of Step 41 Oe. 

c. Then the frame-rate Subband Decorrelation Scale Factor = (1 - x) * (1 - y), 
IS a number between 0 and 1. 

Comments regardhig Step 411: 

The Subband Decorrelation Scale Factor is a fimction of the spectral-steadiness of 
signal characteristics over time in a subband of a channel (the Spectral-Steadiness Factor) 
and the consistency in the same subband of a diannel of bin angles with respect to 

20 corresponding bins of a reference channel (the Interchannel Angle Consistenqr Factor). 
The Subband Decorrelation Scale Factor is high Dnly if both the Spectral-Steadiness 
Factor and the Interchannel Angle Consistency Factor are low. 

As explained above, the Decorrelation Scale Factor controls the degree of 
mvelope decorrelation provided in the decoder. Signals that exhibit spectral steadiness 

25 over tune preferably should not be deconrelated by altering their envelopes, regardless of 
what is h^pming in other channels, as it may result in audible artifacts, namely wavering 

or waibling of the signal. 

Step 412. Derive Subband Amplitude Scale Factors. 

From the subband frame energy values of Step 404 and from the subband firame 
30 energy values of all oQier channels (as may be obtained by a step corresponding to Step 
404 or an equivalent thereof, derive frame-rate Subband Amplitude Scale Factors as 
follows: 
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a. For each subband, sum the energy values par frame across aU input channds. 

b. Divide each subband energy value per frame, (from Step 404) by the sum of the 
energy values across all input channels (from Step 412a) to create values in the range 
of 0 to 1. 

c. Convert eadi ratio to dB, in the range of-oo to 0. 

d. Divide by the scale factor granularity, which may be set at 1.5 dB, for example, 
change sign to yield a non.negative value, limit to a maximum value which may be, for 
example, 31 (i.e. 5-bit precision) and round to the nearest integer to create the quantized 
value. These values are the frame-rate Subband Amplitude Scale Factors and are 
conveyed as part of the sidechain information. 

e. If the coupling frequency of flie encoder is below about 1000 Hz. apply the 
subband frame-averaged or frame-accumulated magnitudes to a time smoother that 
operates on aU subbands below that frequency and above the coupUng frequency. 

Comments regarding Step 412e: See comments regarding step 404c except that 
in the case of Step 412e, there is no suitable subsequent step in which the time smoothing 
may altanativdy be performed. 

Comments for Step 412: 

Al&ough flie granularity (resolution) and quantization precision indicated here 
have been found to be usefiil, they are not critical and other values may provide 
acceptable results. 

Alternatively, one may use ampUtude instead of energy to generate the Subband 
AmpUtude Scale Factors. If using ampUtude, one would use dB=20*log(ampUtude ratio), 
else if using energy, one converts to dB via dB=10*log(energy ratio), where amplitude 

ratio = square root (energy ratio). 

Step 413. Signal-Dependentiy Time Smooth Interchannel Subband Phase 

Angles. 

Apply signal-dependent tenqwral smoofliing to subband frame-rate interchannel 

angles derived in Step 407f: 

a. Let V = Subband Spectral-Steadmess Factor of Step 409d. 

b. Let w = coiresponding Angle Consistoicy Factor of Step 4 lOe. 

c. Letx = (l -v) * w. This is a value betweai 0 and 1, which is high if the 
Spectral-Steadiness Factor is low and the Angle Consistoicy Fact«t is higji. 
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d. Let y = 1 - X. y is high if Spectral-Steadiness Factor is high and Angle 

Consistency Factor is low. 

e. Let z = y^P , where exp is a constant, which may be = O.L z is also in the 
range of 0 to 1, but skewed toward 1, corresponding to a slow time constant. 

f. If the Transient Flag (Step 401) for the channel is set, set z = 0, 
conesponding to a fast tune constant in the presence of a transient. 

g. Compute lim, a maximum allowable value of z, lim = 1 - (0.1 * w). Tliis 
ranges from 0.9 if the Angle Consistency Factor is high to 1.0 if the Angle 
Consistency Factor is low (0). 

h. Limit z by lim as necessary: if (z > lim) then z = lim. 

i. Smooth the subband angle of Step 407f using the value of z and a running 
smoothed value of angle maintained for each subband. If A = angle of Step 407f 
and RS A = running smoothed angle value as of the previous block, and NewRSA 
is the new value of the running smoothed angle, then: NewRSA = RSA * z + A * 
(1 - z). The value of RSA is subsequently set equal to NewRSA before 
processing the following block. New RSA is the signal-depradently time- 
smoothed angle output of Step 413. 

Comments regarding Step 413: 

When a transient is detected, the subband angle update time constant is set to 0, 
allowing a rapid subband angle change. This is deskable because it allows the normal 
angle update mechanism to use a range of relatively slow time constants, minimizing 
unage wandering during static or quasi-static signals, yet fast-changing signals are treated 

with &st time constants. 

Although other smootiung techniques and parameters may be usable, a first-order 
smooflier implementing Step 413 has been found to be suitable. If unplemented as a first- 
order smootiier / lowpass filter, ttie variable "z" corresponds to flie feed-forward 
coefBcient (sometimes denoted "fl©"), while "(1-z)" corresponds to ttie feedback 
coefficient (sometimes denoted "fbl")- 

Step 414. Quantize Smoothed Interchannel Subband Phase Angles. 

Quantize tiie time-smootiied subband interchannel angles derived in Step 413i to 
obtain flie Subband Angle Control Parameter: 

a. If tiie value is less tiian 0, add 2n, so tiiat all angle values^ be quantized are 
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10 



15 



20 



25 



in tlie range 0 to 2%. 

b. Divide by the angle granularity (resolution), which may be 2n 1 64 radians, 
and round to an integer. The maximum value may be set at 63, corresponding to 
6~bit quantization. 
Comments regarding Step 414: 

The quantized value is treated as a non-negative integer, so an easy way to 
quantize tlie angle is to map it to a non-negative floating point number ((add 2tl if less 
than 0, making the range 0 to (less than) 27c)), scale by the granularity (resolution), and 
round to an integer. Similarly, dequantizing that integer (which could otherwise be done 
with a simple table lookup), can be accomplished by scaling by the inverse of the angle 



granularity factor, converting a non-negative integer to a non-negative floating point 
angle (again, range 0 to 27c), after which it can be renormalized to the range ±Ji for further 
use. Although such quantization of the Subband Angle Control Parameter has been found 
to be useful, such a quantization is not critical and other quantizations may provide 
acceptable results. 

Step 415. Quantize Subband Decorrelation Scale Factors. 

Quantize the Subband Decorrelation Scale Factors produced by Step 41 1 to, for 
example, 8 levels (3 bits) by multiplying by 7.49 and rounding to the nearest integer. 
These quantized values are part of the sidechain information. 

Conunents regarding Step 415: 

Although such quantization of the Subban^ Decorrelation Scale Factors has been 
found to be useful, quantization using the example values is not critical and other 
quantizations may provide acceptable results. 

Step 416. Dequantize Subband Angle Control Parameters. 

Dequantize the Subband Angle Control Parameters (see Step 414), to use prior to 
downmixing. . 



Comment regarding Step 416: 

Use of quantized values in the encoder helps maintain synchrony between the 
encoder and the decoder. 

Step 417. Distribute Frame-Rate Dequantized Subband Angle Control 
Parameters Across Blocks. 

In pr^aration for downmixing, distribute the once-per-firame dequai^zed 
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Subband Angle Control Parameters of Step 416 across time to the subbands of each block 
within the fiame. 

Comment regarding Step 417: 

The same frame value may be assigned to each block in the frame. Alternatively, 
5 it may be useful to interpolate the Subband Angle Control Parameter values across the 
blocks in a frame. Linear interpolation over time may be employed in the manner of the 
linear interpolation across frequency, as described below. 

Step 418. Interpolate block Subband Angle Control Parameters to Bins 
Distribute the block Subband Angle Control Parameters of Step 417 for each 
10 channel across frequency to bins, preferably using linear interpolation as described below. 

Comment regarding Step 418: 

If linear interpolation across frequency is employed. Step 418 minimizes phase 
angle changes from bin to bin across a subband boimdary, thereby minimizing aliasing 
artifacts. Such linear interpolation may be enabled, for example, as described below 

1 5 following the description of Step 422. Subband angles are calculated independently of 
one another, each representing an average across a subband. Thus, th^e may be a large 
change from one subband to the next If the net angle value for a subband is applied to all 
bins in the subband (a ''rectangular" subband distribution), the entire phase change from 
one subband to a neighboring subband occurs between two bins. If there is a strong 

20 signal component there, there may be severe, possibly audible, aliasing. Linear 

interpolation, between the centers of each subband, for example, spreads the phase angle 
diange over all the bins in the subband, minimizing the change between any pair of bins, 
so that, for example, the angle at the low end of a subband mates with the angle at the 
high end of the subband below it, while maintaining the overall average the same as the 

25 given calculated subband angle. In other words, instead of rectangular subband 
distributions, the subband angle distribution may be trapezoidally shaped. 

For example, siq)pose that the lowest coupled subband has one bin and a subband 
angle of 20 degrees, the next subband has three bms and a subband angle of 40 degrees, 
and ttie thfrd subband has five bins and a subband angle of 100 degrees. With no 

30 interpolation, assume that the first bin (one subband) is shifted by an angle of 20 degrees, 
the next tfare6 bins (another subband) are shifted by an angle of 40 degrees and the next 
five bins (a fiirdier subband) are shifted by an angle of 1 00 degrees. In tb&t example, 
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there is a 60-degree maximum change, from bin 4 to bin 5. With linear interpolation, the 
first bin still iis sldfted by Ian angle of 20 degrees, the next 3 bins are shifted by about 30, 
40, and 50 degrees; and the next five bins are shifted by about 67, 83, 100, 117, and 133 
degrees. The average subband angle shift is the same, but the maximum bin-to-bin 
5 change is reduced to 17 degrees. 

Optionally, changes in amplitude firom subband to subband, in connection with 
this and other steps described herein, such as Step 417 may also be treated in a similar 
interpolative fashion. However, it may not be necessary to do so because there tends to 
be more natural continuity in amplitude from one subband to the next. 
10 Step 419. Apply Phase Angle Rotation to Bin Transform Values for Channel. 

Apply phase angle rotation to each bin transform value as follows: 

a. Let X = bin angle for this bin as calculated in Step 418. 

b. Let y = -x; 

c. Compute z, a unity-magnitude complex phase rotation scale factor with 
15 angle y, z = cos (y) + j sin (y). 

d. Multiply the bin value (a + yb) by z. 
Comments regarding Step 419: 

The phase angle rotation appUed in the encoder is the inverse of the angle derived 
from the Subband Angle Control Parameter. 
20 Phase angle adjustments, as described herein, in an encoder or encoding process 

prior to downmixing (Step 420) have several advantages: (1) they minimize cancellations 

*■ Pi". 

of the diannels that are summed to a mono composite signal or matrixed to multiple 
channels, (2) they minimize reliance on energy normalization (Step 421), and (3) they 

■ 

precomp^ate the decoder inverse phase angle rotation, thereby reducing aliasing. 

25 Hie phase correction factors can be appUed in the encoder by subtracting each 

subband phase correction value from the angles of each transform bin value in that 
subband. This is equivalent to multiplying each complex bin value by a complex number 
with a magnitude of 1 .0 and an angle equal to the negative of the phase correction factor. 
Note that a complex number of magnitude 1, angle A is equal to cos(A)+j sin(A). This 

30 latter quantity is calculated once for each subband of each channel, with A = -phase 
correction for this subband, then multiplied by each bin complex signal value to realize 
the phase shifted bin value. ^ 
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The phase shift is circular, resulting in circular convolution (as mentioned above). 
While circular convolution may be benign for some continuous signals, it may create 
spurious spectral components for certain continuous complex signals (such as a pitch 
pipe) or may cause blurring of transients if different phase angles are used for different 
subbands. Consequentiy, a suitable technique to avoid circular convolution may be 
employed or the Transient Flag may be employed such that, for example, when the 
Transient Flag is True, the angle calculation results may be overridden, and all subbands 
in a channel may use the same phase correction factor such as zero or a randomized 
value. 

Step 420. Downmix. 

Dowimiix to mono by adding the correspondirig complex transform bins across 
channels to produce a mono composite channel or downmix to multiple channels by 
matrixing the input channels, as for example, in the manner of the example of FIG. 6, as 
described below. 

Comments regarding Step 420: 

In the encoder, once the transform bins of all the channels have been phase 
shifted, the channels are summed, bin-by-bin, to create tiie mono composite audio signal. 
Alternatively, the channels may be applied to a passive or active matrix that provides 
either a simple summation to one channel, as in the N:l encoding of FIG. 1, or to multiple 
channels. The matrix coefficients may be real or complex (real and imaginary). 

Step 421. Normalize. 

To avoid cancellation of isolated bins and over-emphasis of in-phase signals, 
normaUze tiie amplitude of each bin of tiie mono composite channel to have substantially 
the same energy as the sum of the contributing energies, as follows: 

a. Let X = the sum across channels of bin energies (i.e., the squares of die bin 
magnitudes computed in Step 403). 

b. Let y = energy of corresponding bin of the mono composite channel, 

calculated as per Step 403. 

c. Let z = scale factor = squarejroot (x/y). If x = 0 tiien y is 0 and z is set to 

1. 

d Limit z to a maximum value of, for example, 100. If z is initially greater 
than 1 00 (implying strong cancellation from downmixing), add^ arbitrary value. 
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for example, 0.01 * square_root (x) to the real and imaginarj^ parts of tlie mono 
composite bin, which will assure that it is large enough to be normalized by the 
following step. 

e. Multiply the complex mono composite bin value by z. 

« 

5 Comments regarding Step 421: 

Although it is generally desirable to use the same phase factors for both encoding 
and decoding, even the optimal choice of a subband phase correction value may cause 
one or more audible spectral components within the subband to be cancelled during the 
encode downmix process because the phase shifting of step 419 is performed on a 
10 subband rather than a bin basis. In tlxis case, a different phase factor for isolated bins in 
the encoder may be used if it is detected that the sum energy of such bins is mudi less 
than the energy sum of the individual chaimel bins at that frequency. It is generally not 
necessary to apply such an isolated correction factor to the decoder, inasmuch as isolated 
bins usually have little effect on overall image quaUty. A similar normalization may be 
1 5 applied if multiple channels rather than a mono channel are employed. 

Step 422. Assemble and Pack into Bitstream(s). 

The Amplitude Scale Factors, Angle Control Parameters, Decorrelation Scale 
Factors, and Transient Flags side channel information for each channel, along with the 
conmion mono composite audio or the matrixed multiple channels are multiplexed as may 
20 be desired and packed into one or more bitstreams suitable for the storage, transmission 
or storage and transmission medium or media. 

Comment regarding Step 422 : 

The mono composite audio or the multiple channel audio may be applied to a 
data-rate reducing encoding process or device such as, for example, a perceptual encoder 

25 or to a perceptual encoder and an entropy coder (e.g., arithmetic or Huf&nan coder) 

(sometimes referred to as a "lossless" coder) prior to packing. Also, as mentioned above, 
the mono composite audio (or the multiple channel audio) and related sidechain 
information may be derived from multiple input channels only for audio frequencies 
above a certain frequency (a "coupling" frequmcy). In that case, the audio frequencies 

30 below the coupling frequency in each of the multiple input channels may be stored, 
transmitted or stored and transmitted as discrete channels or may be combined or 
processed in some manner other than as described herein. Discrete or othei^ise- 
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combined channels may also be applied to a data reducing encoding process or device 
such as, for example, a perceptual encoder or a perceptual encoder and an entropy 
encoder. The mono composite audio (or the multiple channel audio) and the discrete 
multichannel audio may all be appUed to an integrated perceptual encoding or perceptual 
and entropy encoding process or device prior to packing. 

Optional Interpolation Flag (Not shown in FIG. 4) 

Interpolation across frequency of the basic phase angle sliifls provided by the 
Subband Angle Cc^ntrol Parameters may be enabled in the Encoder (Step 418) and/or in 
the Decoder (Step 505, below). The optional Interpolation Flag sidechain parameter may 
be employed for enabling interpolation in the Decoder. Either the Interpolation Flag or 
an enabling flag similar to the Interpolation Flag may be used in the Encoder. Note that 
• because the Encoder has access to data at the bin level, it may use different interpolation 
values than the Decoder, which interpolates tlie Subband Angle Control Parameters in the 

sidechain information. 

The use of such mterpolation across frequency in the Encoder or the Decoder may 
be enabled if, for example, either of the following two conditions are true: 

Condition 1. If a strong, isolated spectral peak is located at or near the 

boundary of two subbands that have substantially different phase rotation angle 

assignments. 

Reason: wifliout interpolation, a large phase change at the boundary may 
introduce a warble in the isolated spectrsdrcomponent By using interpolation to 
spread the band-to-band phase change across the bin values within the band, the 
amoimt of change at the subband boundaries is reduced. Thresholds for spectral 
peak strength, closeness to a boundary and difference in phase rotation from 
subband to subband to satisfy this condition may be adjusted empirically. 

Condition 2. If; depending on the presence of a transient, either the 
interchannel phase angles (no transient) or the absolute phase angles within a 
channel (transient), comprise a good fit to a Imear progression. 

Reason: Using interpolation to reconstruct the data tends to provide a 
better fit to the original data. Note that the slope of the linear progression need 
not be constant across all frequencies, only within each subband, since angle data 
will still be conveyed to the decoder on a subband basis; and that^rms the input 
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to the Interpolator Step 418. The degree to which the data provides a good fit to 
satisfy this condition may also be determined empirically. 
Other conditions, such as those determined empirically, may benefit fiom 
interpolation across fi-equency. The existence of the two conditions just mentioned may 
5 be determined as follows: 

Condition 1. If a strong, isolated spectral peak is located at or near the 
boundary of two subbands that have substantially different phase rotation angle 
assignments: 

for the Interpolation Flag to be used by the Decoder, the Subband Angle 
1 0 Control Parameters (output of Step 4 1 4), and for enabling of Step 4 1 8 within the 

Encoder, the output of Step 413 before quantization may be used to determine the 
rotation angle fi:om subband to subband. 

for both the Interpolation Flag and for enabling within the Encoder, the 
magnitude output of Step 403, the current DFT magnitudes, may be used to find 
1 5 isolated peaks at subband boimdaries. 

Condition 2. If, depending on the presence of a transient, either tlie 
interchannel phase angles (no transient) or the absolute phase angles within a 
channel (transient), comprise a good fit to a linear progression.: 

if the Transient Flag is not true (no transient), use the relative interchannel 
20 bin phase angles from Step 406 for the fit to a linear progression determination, 

and > 

if the Transient Flag is true (transient), us the channel's absolute phase 
angles from Step 403. 

Decoding 

25 The steps of a decoding process ("decoding steps") may be described as follows. 

With respect to decoding steps, reference is made to FIG. 5, which is in the nature of a 
hybrid flowchart and functional block diagram. For simplicity, the figure shows the 
derivation of sidechain information components for one channel, it being understood that 
sidechain information components must be obtained for each channel unless the channel 
30 is a reference channel for such components, as explained elsewhere. 

Step 501. Unpack and Decode Sidechain Information. 

Unpack and decode (includmg dequantization), as necessary, the sidj^hain data 
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components (Amplitude Scale Factors, Angle Control Parameters, Decorrelation Scale 
Factors, and Transient Flag) for each frame of each channel (one channel shown in FIG. 
5). Table lookups may be used to decode the Amplitude Scale Factors, Angle Control 
Parameter, and Decorrelation Scale Factors. 
5 Comment regarding Step 501: As explained above, if a reference channel is 

employed, the sidechain data for tlie reference chamiel may not include the Angle Control 
Parameters, Decorrelation Scale Factors, and Transient Flag. 

Step 502. Unpack and Decode Mono Composite or rilultichannel Audio 

Signal. 

10 Unpack and decode, as necessary, the mono composite or multichannel audio 

signal information to provide DFT coefficients for each transform bin of the mono 
composite or multichaanel audio signal. 
Comment regarding Step 502: 

Step 501 and Step 502 may be considered to be part of a single unpacking and 
1 5 decoding step. Step 502 may include a passive or active matrix. 

Step 503. Distribute Angle Parameter Values Across Blocks. 
Block Subband Angle Control Parameter values are derived from the dequantized 
frame Subband Angle Control Parameter values. 
Comment regarding Step 503: 
20 Step 503 may be implemented by distributing the same parameter value to every 

block in the frame. . 

Step 504. Distribute Subband Decorrelation Scale Factor Across Blocks. 

■ 

Block Subband Decorrelation Scale Factor values are derived from the 
dequantized frame Subband Decorrelation Scale Factor values. 
25 Conuiient regarding Step 504; 

Step 504 may be implemented by distributing the same scale factor value to every 
block in the firame. 

Step 505. Linearly Interpolate Across Frequency. 

Optionally, derive bin angles from the block subband angles of decoder Step 503 
30 by linear interpolation across frequency as described above in cormection with encoder 
Step 418. Linear interpolation in Step 505 may be enabled when the Interpolation Flag is 
used and is true. 
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Step 506. Add Randomized Phase Angle Offset (Technique 3). 

In accordance with Technique 3, described above, when the Transient Flag 
indicates a transient, add to the block Subband Angle Control Parameter provided by Step 
503, which may have been linearly interpolated across frequency by Step 505, a 
5 randomized offset value scaled by the Decorrelation Scale Factor (the scaling may be 
indirect as set forth in this Step): 

a. Let y = block Subband Decorrelation Scale Factor. 

b. Let z = y"^^ , where exp is a constant, for example - 5. z will also be in the 
range of 0 to 1 , but skewed toward 0, reflecting a bias toward low levels of 

1 0 randomized variation unless the Decorrelation Scale Factor value is high. 

c. Let X = a randomized number between +1 .0 and 1 .0, chosen sq)arately for 
each subband of each block. 

d. Then, the value added to the block Subband Angle Control Parameter to add 
a randomized angle offset value according to Technique 3 is x pi z. 

1 5 Comments regarding Step 506: 

As will be appreciated by those of ordinary skill in the art, ''randomized" angles 
(or ''randomized amplitudes if amplitudes are also scaled) for scaling by the Decorrelation 
Scale Factor may include not only pseudo-random and truly random variations, but also 
detenninistically-generated variations that, when applied to phase angles or to phase 

20 angles and to amplitudes, have the effect of reducing cross-correlation between channels. 
Such '^randomized" variations may be obtained in niany ways. For example, a pseudo- 
random number generator with various seed values may be employed. Alternatively, 
tridy random numbers may be generated using a hardware random number generator. 
Inasmuch as a randomized angle resolution of only about 1 degree may be sufficient, 

25 tables of randomized numbers having two or three decimal places {e.g. 0.84 or 0.844) 
may be employed. Preferably, the randomized values (between -1 .0 and +1 .0 with 
reference to Step 505c, above) are uniformly distributed statistically across each channel 
Although the non-linear indirect scaling of Step 506 has been found to be useful, 
it is not critical and other suitable scalings may be employed - in particular other values 

30 for the exponent may be employed to obtain similar results. 

When the Subband Decorrelation Scale Factor value is 1, a full range of random 
angles from -tu to + tc are added (in which case the block Subband Angle C<^trol 
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Paranieter values produced by Step 503 are rendered irrelevant). As the Subband 
Decorrelation Scale Factor value decreases toward zero, the randomized angle offset also 
decreases toward zero, causing the output of Step 506 to move toward the Subband Angle 
Control Parameter values produced by Step 503. 
5 If desired, the encoder described above may also add a scaled randomized offset 

in accordance with Teclinique 3 to the angle shift applied to a channel before 
downmixing. Doing so may improve alias cancellation in the decoder. It may also be 
beneficial for improving the synchronicity of the encoder and decoder. 

Step 507. Add Randomized Phase Angle Offset (Technique 2). 

10 In accordance with Technique 2, described above, when the Transient Flag does 

not indicate a transient, for each bin, add to all the block Subband Angle Control 
Parameters in a frame provided by Step 503 (Step 505 operates only when the Transient 
Flag indicates a transient) a different randomized offset value scaled by the Decorrelation 
Scale Factor (the scalmg may be direct as set forth herein in this step): 

1 5 a. Let y = block Subband Decorrelation Scale Factor. 

b. Let X = a randomized number between +1.0 and -1.0, chosen separately for 
each bin of each frame. 

c. Then, the value added to the block bin Angle Control Parameter to add a 
randomized angle offset value according to Technique 3 is x * pi * y. 

20 Comments regarding Step 507: 

See conmients above regarding Step 505 joegarding the randomized angle offset. 
Although the direct scaling of Step 507 has been found to be useful, it is not 
critical and other suitable scalings may be employed. 

To minimize temporal discontinuities, the imique randomized angle value for each 
25 bin of each channel preferably does not change with time. The randomized angle values 
of all the bins in a subband are scaled by the same Subband Decorrelation Scale Factor 
value, which is updated at the frame rate. Thus, when the Subband Decorrelation Scale 
Factor value is 1, a full range of random angles from -nto + n are added (in which case 
block subband angle values derived from the dequantized frame subband angle values are 
30 rendered irrelevant). As the Subband Decorrelation Scale Factor value diminishes toward 
zero, the randonuzed angle of&et also diminishes toward zero. Unlike Step 504, the 
scaling in this Step 507 may be a direct function of the Subband Decorrel^on Scale 
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Factor value. For example, a Subband Decorrelation Scale Factor value of 0.5 
proportionally reduces every random angle variation by 0.5. 

The scaled randomized angle value may dien be added to the bin angle from 
decoder Step 506. The Decorrelation Scale Factor value is updated once per frame. In 
the presence of a Transient Flag for the frame, this step is skipped, to avoid transient 
prenoise artifacts. 

If desired, die encoder described above may also add a scaled randomized offset 
in accordance widi Teclmique 2 to the angle shift applied before downmixing. Doing so 
may improve alias cancellation in the decoder. It may also be beneficial for improving 
the synchronicity of the encoder and decoder. 

Step 508. Normalize Amplitude Scale Factors. 

Noraialize Amplitude Scale Factors across channels so that they sum-square to 1. 
Comment regarding Step 508: 

For example, if two channels have dequantized scale factors of -3.0 dB (= 2 * 
granularity of 1.5 dB) (.70795), the sum of the squares is 1.002. Dividing each by the 
square root of 1.002 = 1.001 yields two values of .7072 (-3.01 dB). 

Step 509. Boost Subband Scale Factor Levels (Optional). 

Optionally, when the Transient Flag indicates no transient, apply a slight 
additional boost to Subband Scale Factor levels, dependent on Subband Decorrelation 
Scale Factor levels: multiply each normalized Subband Amplitude Scale Factor by a 
small factor {e.g., 1+ 0.2 * Subband Decorrelatioq^cale Factor). When the Transient 
Flag is Tme, skip this step. 

* 

Comment regarding Step 509: 

This step may be useful because the decoder decorrelation Step 507 may result in 
slightly reduced levels in the final inverse filterbank process. 

Step 510« Distribute Subband Amplitude Values Across Bins. 

Step 510 may be implemented by distributing the same subband amplitude scale 

factor value to every bin in the subband. 

Step 510a. Add Randomized Amplitude Offset (Optional) 
Optionally, apply a randomized variation to the normalized Subband Amplitude 
Scale Factor dep^dent on Subband Decorrelation Scale Factor levels and the Transient 
Fl.ag. In the absence of a transient, add a Randomized Amplitude Scale Fa^or that does 
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not change with time on a bin-by-bin basis (different from bin to bin), and, in the 
presence of a transient (in the frame or block), add a Randomized Amplitude Scale Factor 
that changes on a block-by-block basis (different from block to block) and changes from 
subband to subband (tlie same shift for all bins in a subband; different from subband to 
5 subband). Step 5 1 Oa is not shown in tlie drawings. 
Comment regarding Step 510a: 

Although tlie degree to which randomized amplitude shifts are added may be 

m 

controlled by the Decorrelation Scale Factor, it is believed that a particular scale factor 

4 

value should cause less amplitude sliift than the corresponding randomized phase shift 
1 0 resulting from the same scale factor value in order to avoid audible artifacts. 

Step 511. Upmix. 

a. For each bin of each output channel, construct a complex upmix scale 
factor from the amplitude of decoder Step 508 and the bin angle of decoder 
Step 507: (ampUtude * (cos (angle) + j sin (angle)). 
15 b. For each output channel, multiply the complex bin value and the 

complex upmix scale factor to produce the upmixed complex output bin value of 

each bin of the channel. 

Step 512. Perform Inverse DFT (Optional). 

Optionally, perform an inverse DFT transform on the bins of each output channel 
20 . to yield multichannel output PCM values. As is well known, in connection with such an 
inverse DFT transformation, the individual blocks-of time samples are windowed, and 
adjacent blocks are overlapped and added together in order to reconstruct the final 
continuous time output PCM audio signal. 
Comments regarding Step 512: 
25 A decoder according to the present invention may not provide PCM outputs. In 

the case where the decoder process is employed only above a given coupling frequency, 
and discrete MDCT coefficients are sent for each channel below that frequency, it may be 
desirable to convert the DFT coefficients derived by the decoder upmixing Steps 5 11a 
and 51 lb to MDCT coefficients, so that they can be combined with the lower frequency 
30 discrete MDCT coefficients and requantized in order to provide, for example, a bitstream 
compatible with an encoding system that has a large number of installed users, such as a 
standard AC-3 SP/DIF bitstream for application to an external device whe^ an inverse 
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transfonn may be performed. An inverse DFT transform may be applied to ones of the 
output channels to provide PCM outputs. 

Section 8.2,2 of theA/52A Document 
With Sensitivity Factor 'T" Added 
5 8.2.2. Transient detection 

Transients are detected in the full-bandwidth channels in order to decide when to 
switch to short length audio blocks to improve pre-echo performance. High-pass filtered 
versions of the signals are examined for an increase in energy from one sub-block time- 
segment to the next. Sub-blocks are examined at different time scales. If a transient is 
10 detected in the second half of an audio block in a channel that channel switches to a short 
block. A channel that is block-switched uses the D45 exponent strategy [Le., the data has 
a coarser frequency resolution in order to reduce the data overhead resulting from the 
increase in temporal resolution]. 

The transient detector is used to determine when to switch from a long transform 
1 5 block (length 5 12), to the short block (length 256). It operates on 5 12 samples for every 
audio block. This is done in two passes, with each pass processing 256 samples. Transient 
detection is broken down into four steps: 1) high-pass filtering, 2) segmentation of the 
block into submultiples, 3) peak amplitude detection within each sub-block segment, and 
4) threshold comparison. The transient detector outputs a flag blksw[n] for each fiill- 
20 bandwidth channel, which when set to "one" indicates the presence of a transient in the 
second half of the 512 length input block for the corresponding channel. 

1) High-pass filtering: The high-pass filter is implemented as a cascaded 
biquad direct form 11 IIR filter with a cutoff of 8 kHz. 

2) Block Segmentation: The block of 256 high-pass filtered samples are 
25 segmented into a hierarchical tree of levels in which level 1 represents the 256 

length block, level 2 is two segments of length 128, and level 3 is four segments 
of length 64. 

3) Peak Detection: The sample with the largest magnitude is identified for 
each segment on every level of the hierarchical tree. The peaks for a single level 

30 are found as follows: 

P[j][k] = max(x(n)) 

for n = (512 x (k-1) / 2^j), (512 x (k-1) / 2^^) + 1 , ...(512^ k / 2^j) - 1 
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andk=l, 2'^(j-l) ; 

where: x(n) = the nth sample in the 256 length block 
j = 1, 2, 3 is the hierarchical level number 
k = the segment number within level j 

Note that P|j][0], (i.e., k=0) is defined to be the peak of the last 
segment on level j of the tree calculated immediately prior to the current 
tree. For example, P[3][4] in tlie preceding tree ig P[3][0] in the current 
tree. 

4) Threshold Comparison: The first stage of tlie threshold comparator 
checks to see if there is significant signal level in the current block. This is done 
by comparing the overall peak value P[1][11 of the current block to a "silence 
threshold". If PI1][1] is below this threshold then a long block is forced. The silence 
threshold value is 100/32768. The next stage of the comparator checks the relative 
peak levels of adjacent segments on each level of the hierarchical tree. If the peak 
ratio of any two adjacent segments on a particular level exceeds a pre-defined 
threshold for that level, then a flag is set to indicate the presence of a transient in 
the current 256-length block. The ratios are compared as follows: 

mag(PD][k]) X T^ > (F * mag(PU][(k-l)])) [Note the *T" sensiti^^ 

factor] 

where: T[j] is the pre-defined threshold for level j, defined as: 
T[l] = .l 
T[2] = .075 
T[3] = .05 

If this inequality is true for any two segment peaks on any level, 
then a transient is indicated for the first half of the 512 length input block. 
The second pass through this process determines the presence of transients 
in the second half of the 512 length input block. 

N:M Encoding 

Aspects of the present invention are not limited to N:l encoding as described in 
connection with FIG. 1. More generally, aspects of the invention are applicable to the 
transformation of any number of input channels (n input chaimels) to any^umber of 
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output channels (m output channels) in the manner of FIG. 6 (Le., N:M encoding). 
Because in many conmion applications the number of input channels n is greater than the 
number of output channels m, the N:M encoding arrangement of FIG. 6 will be referred . 
to as "downmixing" for convenience in description. 
5 Referring to the details of FIG. 6, instead of summing the outputs of Rotate Angle 

S and Rotate Angle 10 in the Additive Combiner 6 as in the arrangement of FIG. 1, those 
outputs may be applied to a downmix matrix device or function 6' ("Downmix Matrix"). 
Downmix Matrix 6' may be a passive or active matrix that provides either a simple 
summation to one channel, as in the N:l encoding of FIG. 1, or to multiple channels. The 
10 matrix coefficients may be real or complex (real and imaginary). Other devices and 

fimctions in FIG. 6 may be the same as in the FIG. 1 arrangement and they bear the same 
refermce numerals. 

Downmix Matrix 6* may provide a hybrid fi:equency-dependent function such that 
it provides, for example, mn-n channels in a frequency range fl to f2 and me-c chaimels 

15 in a frequency range f2 to fJ. For example, below a coupling frequency of, for example, 
1000 Hz the Downmix Matrix 6' may provide two channels and above the coupling 
frequency the Downmix Matrix 6' may provide one channel. By employing two chaimels 
below the coupling frequency, better spatial fidelity may be obtained, especially if the 
two channels represent horizontal directions (to match the horizontality of the human 

20 ears). 

Although FIG. 6 shows the generation of jjl^c same sidechain information for each 
chaimel as in the FIG. 1 arrangement, it niay be possible to omit certain ones of the 
sidechain information when more than one chaimel is provided by the output of the 
Downmix Matrix 6\ In some cases, acceptable results may be obtained when only the 

25 amplitude scale factor sidechain information is provided by the FIG. 6 arrangement. 
Further details regarding sidechain options are discussed below in connection with the 
descriptions of FIGS. 7, 8 and 9. 

As just mentioned above, the multiple channels generated by the Downmix Matrix 
6' need not be fewer than the number of input channels n. When the piupose of an 

30 encoder such as in FIG. 6 is to reduce the number of bits for transmission or storage, it is 
likely that the number of channels produced by downmix matrix 6' will be fewer than the 
number of input channels n. However, the arrangement of FIG. 6 may alsc^e used as an 
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•*upmixer." In that case, there may be appUcations in which the number of channels m 
produced by the Downmix Matrix 6' is more than the number of input channels n. 

Encoders as described in connection with the examples of FIGS. 2, 5 and 6 may 
also include their own local decoder or decoding function in order to determine if the 
audio information and the sidechain infomiation, when decoded by such a decoder, would 
provide suitable results. Tlie results of such a determination could be used to improve the 
parameters by employing, for example, a recursive process, hi a block encoding and 
decoding system, recursion calculations could be performed, for example, on every block 
before the next block ends in order to mmimize the delay in transmitting a block of audio 
information and its associated spatial parameters. 

An arrangement in which the encoder also includes its own decoder or decoding 
function could also be employed advantageously when spatial parameters are not stored 
or sent only for certain blocks. If unsuitable decoding would result from not sending 
spatial-parameter sidechain information, such sidecham information would be sent for the 
particular block, hi this case, the decoder may be a modification of the decoder or 
decodmg function of FIGS. 2, 5 or 6 m that the decoder would have both the ability to 
recover spatial-parameter sidecham information for frequencies above the coupUng 
frequency from the mcoming bitstream but also to generate simulated spatial-parameter 
sidechain information from the stereo information below the coupling frequency. 

In a shnplified alternative to such local-decoder-incorporating encoder examples, 
rather than having a local decoder or decoder function, tiie encoder could simply check to 
determine if ttiere were any signal content below the coupUng frequency (determined m 
any suitable way, for example, a sum of flie energy in frequency bms through the 
frequency range), and, if not, it would send or store spatial-parameter sidechain 
mformation rather than not doing so if the energy were above the tiireshold. Dependmg 
on the encoding scheme, low signal mformation below tiie coupling frequency may also 
result m more bits being available for sending sidechain infonnation. 

% M:NDecoding 
A more generalized fom of the arrangemeat of FIG. 2 is shown in FIG. 7, 
wherein an upmix matrix function or device ("Upmix Matrix") 20 receives the 1 to m 
channels generated by the arrangement of FIG. 6. The Upmix Matrix 20 may be a 
, passive matrix. It may be, but need not be, the conjugate transposition (;^e., the 
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complement) of the Downmix Matrix 6' of the FIG. 6 arrangement Alternatively, the 
Upmix Matrix 20 may be an active matrix - a variable matrix or a passive matrix in 
combination with a variable matrix. If an active matrix decoder is employed, in its 
relaxed or quiescent state it may be the complex conjugate of the Downmix Matrix or it 
may be independent of the Downmix Matrix . The sidechain information may be applied 
as shown in FIG. 7 so as to control the Adjust Amplitude, Rotate Angle, and (optional) 
Interpolator functions or devices. In that case, the Upmix Matrix, if an active matrix, 
operates independently of the sidechain information and responds only to the channels 
applied to it. Alternatively, some or all of the sidechain information may be applied to 
the active matrix to assist its operation. In that case, some or all of the Adjust Amplitude, 
Rotate Angle, and Interpolator fimctions or devices may be omitted. The Decoder 
example of FIG. 7 may also employ the alternative of applying a degree of randomized 
amplitude variations under certain signal conditions, as described above in comiection 

with FIGS. 2 and 5. 

When Upmix Mahix 20 is an active matrix, the arrangement of FIG. 7 may be 
characterized as a "hybrid matrix decoder" for operatmg in a "hybrid matrix 
encoder/decoder system." "Hybrid" in this context refers to the fact that the decoder may 
derive some measure of control information from its input audio signal (i.e., the active 
matrix responds to spatial information encoded in the channels applied to it) and a furtlier 
measure of control information from spatial-parameter sidecham information. Other 
elements of FIG. 7 are as in the arrangement of FIG. 2 and bear the same reference 
numerals. 

Suitable active matrix decoders for use in a hybrid matrix decoder may include 
active matrix decoders such as those mentioned above and incorporated by reference, 
including, for example, matiix decoders known as "Pro Logic" and "Pro Logic H" 
decoders ("Pro Logic" is a trademaric of Dolby Laboratories Licensing Corporation). 

Alternative Decorrelation 

FIGS. 8 and 9 show variations on the generalized Decoder of FIG. 7. In 
particular, botii the arrangement of FIG. 8 and the attangement of FIG. 9 show 
alternatives to the decorrelation technique of FIGS. 2 and 7. In FIG. 8. respective 
decorrelator fimctions or devices ('TDecorrelators") 46 and 48 are in the time domain, 
each following flie respective Inverse FUterbank 30 and 36 in their chani^. In FIG. 9, 
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respective decorrelator fimctions or devices ('Decorrelators") 50 and 52 are in the 
jfrequency domain, each preceding the respective Inverse Filterbank 30 and 36 in their 
channel. In both the FIG. 8 and FIG. 9 arrangements, each of the Decorrelators (46, 48, 
50, 52) has a unique characteristic so that their outputs are mutually decorreiated with 
5 respect to each otiier. The Decorrelation Scale Factor may be used to control, for 
example, the ratio of decorreiated to xincorrelated signal provided in each charmel. 
Optionally, the Transient Flag may also be used to shift the mode of operation of the 
Decorrelator, as is explained below. In both the FIG. 8 and FI(j. 9 arrangements, each 
Decorrelator may be a Scliroeder-type reverberator having its own unique filter 

1 0 characteristic, in which the amount or degree of reverberation is controlled by the 

decorrelation scale factor (implemented, for example, by controlling the degree to which 
the Decorrelator output forms a part of a linear combination of the Decorrelator input and 
output). Alternatively, other controllable decorrelation techniques may be employed 
either alone or in combination with each other or with a Schroeder-type reverberator. 

1 5 Schroeder-type reverberators are well known and may trace their origin to two journal 
papers: "'Colorless' Artificial Reverberation" by M.R. Schroeder and B.F. Logan, IRE 
Transaction on Audio, vol. AU-9, pp. 209-214, 1961 and ''Natural Sounding Artificial • 
Reverberation" by M.R. Schroeder, Journal July 1962, vol. 10, no. 2, pp. 219-223. 

When the Decorrelators 46 and 48 operate in the time domain, as in the FIG. 8 

20 arrangement, a single {i.e., wideband) Decorrelation Scale Factor is required. This may 
be obtained by any of several ways. For exampley^only a single Decorrelation Scale 
Factor may be generated in the encoder oifFIG. 1 or FIG. 7. Alternatively, if the encoder 
of FIG. 1 or FIG. 7 generates Decorrelation Scale Factors on a subband basis, the 
Subband Decorrelation Scale Factors may be amplitude or power summed in the encoder 

25 of FIG. 1 or FIG. 7 or in the decoder of FIG. 8. 

When the Decorrelators 50 and 52 operate in the firequency domain, as in the FIG. 
9 arrangement, they may receive a decorrelation scale factor for each subband or groups 
of subbands and, concomitantly, provide a conmiensurate degree of decorrelation for such 
subbands or groups of subbands. 

30 The Decorrelators 46 and 48 of FIG. 8 and the Decorrelators 50 and 52 of FIG. 9 

may optionally receive the Transient Flag. In the time-domain Decorrelators of FIG. 8, 
the Transient Flag may be employed to shift the mode of operation of the r^pective 
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Decorrelator. For example, the Decorrelator may operate as a Schroeder-type 
reverberator in tlie absence of the transient flag but upon its receipt and for a short 
subsequent time period, say 1 to 10 milliseconds, operate as a fixed delay. Each channel 
may have a predetermined fixed delay or the delay may be varied in response to a 
plurality of transients within a short time period. In the frequency-domain DecoiTclators 
of FIG. 9, the transient flag may also be employed to shift the mode of operation of the 
respective Decorrelator. However, in this case, the receipt of a transient flag may, for 
example, trigger a short (several milliseconds) increase in amplitude in the channel in 
which the flag occurred. 

In both the FIG. 8 and 9 arrangements, an Interpolator 27 (33), controlled by the 
optional Transient Flag, may provide interpolation across frequency of the phase angles 
output of Rotate Angle 28 (33) in a manner as described above. 

As mentioned.above, when two or more chaimels are sent in addition to sidechain 
information, it may be acceptable to reduce the number of sidechain parameters. For 
example, it may be acceptable to send only the Amplitude Scale Factor, in which case the 
decorrelation and angle devices or frinctions in the decoder may be omitted (in that case, 
FIGS. 7, 8 and 9 reduce to the same arrangement). 

Alternatively, only the amplitude scale factor, tlie Decorrelation Scale Factor, and, 
optionally, the Transient Flag may be sent. In that case, any of the FIG. 7, 8 or 9 
arrangements may be employed (omitting the Rotate Angle 28 and 34 in each of them). 

As another alternative, only the ampUtude scale factor and the angle control 
parameter may be sent In that case, any of the FIG. 7, 8 or 9 arrangements may be 
employed (omitting the Decorrelator 38 and 42 of FIG. 7 and 46, 48, 50, 52 of FIGS. 8 
and 9). 

As in FIGS. 1 and 2, the arrangements of FIGS. 6-9 are intended to show any 
number of input and output channels although, for simplicity in presentation, only two 
channels are shown. 

It should be understood that implementation of other variations and modifications 
of the invention and its various aspects will be apparent to those skilled in the art, and that 
the invention is not limited by these specific embodiments described. It is therefore 
contemplated to cover by the present invention any and all modifications, \^ations, or 
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equivalents that fall within the true spirit and scope of the basic underlying principles 
disclosed herein. 



